1
|
Cook K, Lu W, Wang R. Marginal proportional hazards models for clustered interval-censored data with time-dependent covariates. Biometrics 2023; 79:1670-1685. [PMID: 36314377 DOI: 10.1111/biom.13787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 10/18/2022] [Indexed: 11/29/2022]
Abstract
The Botswana Combination Prevention Project was a cluster-randomized HIV prevention trial whose follow-up period coincided with Botswana's national adoption of a universal test and treat strategy for HIV management. Of interest is whether, and to what extent, this change in policy modified the preventative effects of the study intervention. To address such questions, we adopt a stratified proportional hazards model for clustered interval-censored data with time-dependent covariates and develop a composite expectation maximization algorithm that facilitates estimation of model parameters without placing parametric assumptions on either the baseline hazard functions or the within-cluster dependence structure. We show that the resulting estimators for the regression parameters are consistent and asymptotically normal. We also propose and provide theoretical justification for the use of the profile composite likelihood function to construct a robust sandwich estimator for the variance. We characterize the finite-sample performance and robustness of these estimators through extensive simulation studies. Finally, we conclude by applying this stratified proportional hazards model to a re-analysis of the Botswana Combination Prevention Project, with the national adoption of a universal test and treat strategy now modeled as a time-dependent covariate.
Collapse
Affiliation(s)
- Kaitlyn Cook
- Program in Statistical and Data Sciences, Smith College, Northampton, Massachusetts, USA
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Rui Wang
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
2
|
Jang JH, Manatunga A. Diagnostic evaluation of pharmacokinetic features of functional markers. J Biopharm Stat 2023; 33:307-323. [PMID: 36426623 PMCID: PMC10079622 DOI: 10.1080/10543406.2022.2148163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 10/27/2022] [Indexed: 11/26/2022]
Abstract
The dynamicity of functional (curve) markers from modern clinical studies offers deeper insights into complex disease physiology. A frequent clinical practice is to examine various 'pharmacokinetic features' of functional markers (definite integral, maximum value, time to maximum, etc.) that reflect important physiological underpinnings. For instance, the current diagnostic procedure for kidney obstruction is to examine several pharmacokinetic features of renogram curves characterizing renal function. Motivated by such clinical practices, we develop a statistical framework for evaluating diagnostic accuracy of pharmacokinetic features using area under the receiver operating characteristic curve (AUC). The major challenge is that functional markers are observed at discrete time points with measurement error. To address this challenge, we develop a two-stage non-parametric AUC estimator based on summary functionals providing unified representation of various pharmacokinetic features and study its asymptotic properties. We also propose a sensible adaptation of a semiparametric regression model that can describe heterogeneity of AUC across different subpopulations, while appropriately handling discreteness and noise in observed functional markers. Here, a novel data-driven approach that balances between bias and efficiency of the regression coefficient estimates is introduced. Finally, the framework is applied to rigorously evaluate pharmacokinetic features of renogram curves potentially useful for detecting kidney obstruction.
Collapse
Affiliation(s)
- Jeong Hoon Jang
- Underwood International College and Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
| | - Amita Manatunga
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA
| |
Collapse
|
3
|
Strömer A, Klein N, Staerk C, Klinkhammer H, Mayr A. Boosting multivariate structured additive distributional regression models. Stat Med 2023; 42:1779-1801. [PMID: 36932460 DOI: 10.1002/sim.9699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 02/03/2023] [Accepted: 02/17/2023] [Indexed: 03/19/2023]
Abstract
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable selection, taking various different types of effects into account. As a special merit of our approach, it allows for modeling the association between multiple continuous or discrete outcomes through the relevant covariates. After a detailed simulation study investigating estimation and prediction performance, we demonstrate the full flexibility of our approach in three diverse biomedical applications. The first is based on high-dimensional genomic cohort data from the UK Biobank, considering a bivariate binary response (chronic ischemic heart disease and high cholesterol). Here, we are able to identify genetic variants that are informative for the association between cholesterol and heart disease. The second application considers the demand for health care in Australia with the number of consultations and the number of prescribed medications as a bivariate count response. The third application analyses two dimensions of childhood undernutrition in Nigeria as a bivariate response and we find that the correlation between the two undernutrition scores is considerably different depending on the child's age and the region the child lives in.
Collapse
Affiliation(s)
- Annika Strömer
- Department of Medical Biometrics, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Nadja Klein
- Chair of Uncertainty Quantification and Statistical Learning, Research Center Trustworthy Data Science and Security (UA Ruhr) and Department of Statistics (Technische Universität Dortmund), Dortmund, Germany
| | - Christian Staerk
- Department of Medical Biometrics, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Hannah Klinkhammer
- Department of Medical Biometrics, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany.,Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Andreas Mayr
- Department of Medical Biometrics, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| |
Collapse
|
4
|
Pan A, Song X, Huang H. Bayesian analysis for partly linear Cox model with measurement error and time-varying covariate effect. Stat Med 2022; 41:4666-4681. [PMID: 35899596 PMCID: PMC9489624 DOI: 10.1002/sim.9531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 05/30/2022] [Accepted: 07/05/2022] [Indexed: 01/07/2023]
Abstract
The Cox proportional hazards model is commonly used to estimate the association between time-to-event and covariates. Under the proportional hazards assumption, covariate effects are assumed to be constant in the follow-up period of study. When measurement error presents, common estimation methods that adjust for an error-contaminated covariate in the Cox proportional hazards model assume that the true function on the covariate is parametric and specified. We consider a semiparametric partly linear Cox model that allows the hazard to depend on an unspecified function of an error-contaminated covariate and an error-free covariate with time-varying effect, which simultaneously relaxes the assumption on the functional form of the error-contaminated covariate and allows for nonconstant effect of the error-free covariate. We take a Bayesian approach and approximate the unspecified function by a B-spline. Simulation studies are conducted to assess the finite sample performance of the proposed approach. The results demonstrate that our proposed method has favorable statistical performance. The proposed method is also illustrated by an application to data from the AIDS Clinical Trials Group Protocol 175.
Collapse
Affiliation(s)
- Anqi Pan
- Department of Epidemiology and Biostatistics, College of Public HealthUniversity of GeorgiaAthensGeorgiaUSA
| | - Xiao Song
- Department of Epidemiology and Biostatistics, College of Public HealthUniversity of GeorgiaAthensGeorgiaUSA
| | - Hanwen Huang
- Department of Epidemiology and Biostatistics, College of Public HealthUniversity of GeorgiaAthensGeorgiaUSA
| |
Collapse
|
5
|
Abeysiri Wickrama Liyanaarachchige PT, Fisher R, Thompson H, Menendez P, Gilmour J, McGree JM. Adaptive monitoring of coral health at Scott Reef where data exhibit nonlinear and disturbed trends over time. Ecol Evol 2022; 12:e9233. [PMID: 36110888 PMCID: PMC9465202 DOI: 10.1002/ece3.9233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 07/22/2022] [Accepted: 07/28/2022] [Indexed: 11/23/2022] Open
Abstract
Time series data are often observed in ecological monitoring. Frequently, such data exhibit nonlinear trends over time potentially due to complex relationships between observed and auxiliary variables, and there may also be sudden declines over time due to major disturbances. This poses substantial challenges for modeling such data and also for adaptive monitoring. To address this, we propose methods for finding adaptive designs for monitoring in such settings. This work is motivated by a monitoring program that has been established at Scott Reef; a coral reef off the Western coast of Australia. Data collected for monitoring the health of Scott Reef are considered, and semiparametric and interrupted time series modeling approaches are adopted to describe how these data vary over time. New methods are then proposed that enable adaptive monitoring designs to be found based on such modeling approaches. These methods are then applied to find future monitoring designs at Scott Reef where it was found that future information gain is expected to be similar across a variety of different sites, suggesting that no particular location needs to be prioritized at Scott Reef for the next monitoring phase. In addition, it was found that omitting some sampling sites/reef locations was possible without substantial loss in expected information gain, depending upon the disturbances that were observed. The resulting adaptive designs are used to form recommendations for future monitoring in this region, and for reefs where changes in the current monitoring practices are being sought. As the methods used and developed throughout this study are generic in nature, this research has the potential to improve ecological monitoring more broadly where complex data are being collected over time.
Collapse
Affiliation(s)
- Pubudu Thilan Abeysiri Wickrama Liyanaarachchige
- School of Mathematical Sciences, Faculty of Science Queensland University of Technology (QUT) Brisbane Queensland Australia.,Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS) Brisbane Queensland Australia.,Centre for Data Science, Queensland University of Technology Brisbane Queensland Australia.,Department of Mathematics University of Ruhuna Matara Sri Lanka
| | - Rebecca Fisher
- Australian Institute of Marine Science Crawley Western Australia Australia.,Oceans Institute University of Western Australia Crawley Western Australia Australia
| | - Helen Thompson
- School of Mathematical Sciences, Faculty of Science Queensland University of Technology (QUT) Brisbane Queensland Australia.,Centre for Data Science, Queensland University of Technology Brisbane Queensland Australia
| | - Patricia Menendez
- Department of Econometric and Business Statistics Monash University Clayton Victoria Australia.,Australian Institute of Marine Science Townsville Queensland Australia
| | - James Gilmour
- Australian Institute of Marine Science Crawley Western Australia Australia
| | - James M McGree
- School of Mathematical Sciences, Faculty of Science Queensland University of Technology (QUT) Brisbane Queensland Australia.,Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS) Brisbane Queensland Australia.,Centre for Data Science, Queensland University of Technology Brisbane Queensland Australia
| |
Collapse
|
6
|
Liu J, Zhang X, Chen T, Wu T, Lin T, Jiang L, Lang S, Liu L, Natarajan L, Tu J, Kosciolek T, Morton J, Nguyen T, Schnabl B, Knight R, Feng C, Zhong Y, Tu X. A semiparametric model for between-subject attributes: Applications to beta-diversity of microbiome data. Biometrics 2022; 78:950-962. [PMID: 34010477 PMCID: PMC8602427 DOI: 10.1111/biom.13487] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 04/23/2021] [Accepted: 05/03/2021] [Indexed: 01/25/2023]
Abstract
The human microbiome plays an important role in our health and identifying factors associated with microbiome composition provides insights into inherent disease mechanisms. By amplifying and sequencing the marker genes in high-throughput sequencing, with highly similar sequences binned together, we obtain operational taxonomic units (OTUs) profiles for each subject. Due to the high-dimensionality and nonnormality features of the OTUs, the measure of diversity is introduced as a summarization at the microbial community level, including the distance-based beta-diversity between individuals. Analyses of such between-subject attributes are not amenable to the predominant within-subject-based statistical paradigm, such as t-tests and linear regression. In this paper, we propose a new approach to model beta-diversity as a response within a regression setting by utilizing the functional response models (FRMs), a class of semiparametric models for between- as well as within-subject attributes. The new approach not only addresses limitations of current methods for beta-diversity with cross-sectional data, but also provides a premise for extending the approach to longitudinal and other clustered data in the future. The proposed approach is illustrated with both real and simulated data.
Collapse
Affiliation(s)
- J. Liu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| | - X. Zhang
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,
| | - T. Chen
- Department of Mathematics, University of Toledo, Toledo, Ohio, U.S.A
| | - T. Wu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| | - T. Lin
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - L. Jiang
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,Center for Microbiome Innovation, UC San Diego, San Diego, California, U.S.A
| | - S. Lang
- Department of Medicine, UC San Diego, San Diego, California, U.S.A
| | - L. Liu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - L. Natarajan
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - J.X. Tu
- Physical Medicine and Rehabilitation, University of Virginia Health System, Charlottesville, Virginia, U.S.A
| | - T. Kosciolek
- Department of Pediatrics, UC San Diego, San Diego, California, U.S.A.,Ma lopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - J. Morton
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, New York, U.S.A
| | - T.T Nguyen
- Department of Psychiatry, UC San Diego, San Diego, California, U.S.A.,Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| | - B. Schnabl
- Department of Medicine, UC San Diego, San Diego, California, U.S.A
| | - R. Knight
- Department of Pediatrics, UC San Diego, San Diego, California, U.S.A.,Department of Computer Science and Engineering, UC San Diego, San Diego, California, U.S.A.,Department of Bioengineering, UC San Diego, San Diego, California, U.S.A.,Center for Microbiome Innovation, UC San Diego, San Diego, California, U.S.A
| | - C. Feng
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, U.S.A
| | - Y. Zhong
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - X.M. Tu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| |
Collapse
|
7
|
Fritz C, Kauermann G. On the interplay of regional mobility, social connectedness and the spread of COVID-19 in Germany. J R Stat Soc Ser A Stat Soc 2022; 185:400-424. [PMID: 34908652 PMCID: PMC8662283 DOI: 10.1111/rssa.12753] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 08/31/2021] [Indexed: 05/12/2023]
Abstract
Since the primary mode of respiratory virus transmission is person-to-person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID-19. While research has shown that non-pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, we investigate the relative regional mobility behaviour to assess the effect of human movement on the spread of COVID-19. In particular, we explore the impact of human mobility and social connectivity derived from Facebook activities on the weekly rate of new infections in Germany between 3 March and 22 June 2020. Our results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns. The extent of social distancing, quantified by the percentage of people staying put within a federal administrative district, has an overall negative effect on the incidence of infections. Additionally, our results show spatial infection patterns based on geographical as well as social distances.
Collapse
Affiliation(s)
- Cornelius Fritz
- Department of StatisticsLudwig‐Maximilians‐Universität MünchenMunichGermany
| | - Göran Kauermann
- Department of StatisticsLudwig‐Maximilians‐Universität MünchenMunichGermany
| |
Collapse
|
8
|
Abstract
Mean residual life (MRL) function defines the remaining life expectancy of a subject who has survived to a time point and is an important alternative to the hazard function for characterizing the distribution of a time-to-event variable. Existing MRL models primarily focus on studying the association between risk factors and disease risks using linear model specifications in multiplicative or additive scale. When risk factors have complex correlation structures, nonlinear effects, or interactions, the prefixed linearity assumption may be insufficient to capture the relationship. Single-index modeling framework offers flexibility in reducing dimensionality and modeling nonlinear effects. In this article, we propose a class of partially linear single-index generalized MRL models, the regression component of which consists of both a semiparametric single-index part and a linear regression part. Regression spline technique is employed to approximate the nonparametric single-index function, and parameters are estimated using an iterative algorithm. Double-robust estimators are also proposed to protect against the misspecification of censoring distribution or MRL models. A further contribution of this article is a nonparametric test proposed to formally evaluate the linearity of the single-index function. Asymptotic properties of the estimators are established, and the finite-sample performance is evaluated through extensive numerical simulations. The proposed models and inference approaches are demonstrated by a New York University Langone Health (NYULH) COVID-19 dataset.
Collapse
Affiliation(s)
- Peng Jin
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, NY 10016, U.S.A
| | - Mengling Liu
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, NY 10016, U.S.A
- Department of Environmental Medicine, New York University Grossman School of Medicine, New York, NY 10016, U.S.A
| |
Collapse
|
9
|
Aydın D, Ahmed SE, Yılmaz E. Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator. Entropy (Basel) 2021; 23:e23121586. [PMID: 34945891 PMCID: PMC8699840 DOI: 10.3390/e23121586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/20/2021] [Accepted: 11/22/2021] [Indexed: 11/16/2022]
Abstract
This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a proper A-spline estimator for the components of the semiparametric model. The first problem is traditionally solved by the synthetic data approach based on the Kaplan-Meier estimator. In practice, although the synthetic data technique is one of the most widely used solutions for right-censored observations, the transformed data's structure is distorted, especially for heavily censored datasets, due to the nature of the approach. In this paper, we introduced a modified semiparametric estimator based on the A-spline approach to overcome data irregularity with minimum information loss and to resolve the second problem described above. In addition, the semiparametric B-spline estimator was used as a benchmark method to gauge the success of the A-spline estimator. To this end, a detailed Monte Carlo simulation study and a real data sample were carried out to evaluate the performance of the proposed estimator and to make a practical comparison.
Collapse
Affiliation(s)
- Dursun Aydın
- Department of Statistics, Faculty of Science, Mugla Sitki Kocman University, Kotekli 48000, Turkey;
| | - Syed Ejaz Ahmed
- Department of Mathematics and Statistics, Faculty of Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON L2S 3A1, Canada;
| | - Ersin Yılmaz
- Department of Statistics, Faculty of Science, Mugla Sitki Kocman University, Kotekli 48000, Turkey;
- Correspondence:
| |
Collapse
|
10
|
Wang L, Wang L. Regression analysis of arbitrarily censored survival data under the proportional odds model. Stat Med 2021; 40:3724-3739. [PMID: 33882618 DOI: 10.1002/sim.8994] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 02/13/2021] [Accepted: 03/29/2021] [Indexed: 11/09/2022]
Abstract
Arbitrarily censored data are referred to as the survival data that contain a mixture of exactly observed, left-censored, interval-censored, and right-censored observations. Existing research work on regression analysis on arbitrarily censored data is relatively sparse and mainly focused on the proportional hazards model and the accelerated failure time model. This article studies the proportional odds (PO) model and proposes a novel estimation approach through an expectation-maximization (EM) algorithm for analyzing such data. The proposed EM algorithm has many appealing properties such as being robust to initial values, easy to implement, converging fast, and providing the variance estimate of the regression parameter estimate in closed form. An informal diagnosis plot is developed for checking the PO model assumption. Our method has shown excellent performance in estimating the regression parameters as well as the baseline survival function in a simulation study. A real-life dataset about metastatic colorectal cancer is analyzed for illustration. An R package regPO has been created for practitioners to implement our method.
Collapse
Affiliation(s)
- Lu Wang
- Department of Mathematics, Western New England University, Springfield, Massachusetts, USA
| | - Lianming Wang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
11
|
Gu E, Zhang J, Lu W, Wang L, Felizzi F. Semiparametric estimation of the cure fraction in population-based cancer survival analysis. Stat Med 2020; 39:3787-3805. [PMID: 32721045 DOI: 10.1002/sim.8693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 03/24/2020] [Accepted: 06/21/2020] [Indexed: 11/10/2022]
Abstract
With rapid development in medical research, the treatment of diseases including cancer has progressed dramatically and those survivors may die from causes other than the one under study, especially among elderly patients. Motivated by the Surveillance, Epidemiology, and End Results (SEER) female breast cancer study, background mortality is incorporated into the mixture cure proportional hazards (MCPH) model to improve the cure fraction estimation in population-based cancer studies. Here, that patients are "cured" is defined as when the mortality rate of the individuals in diseased group returns to the same level as that expected in the general population, where the population level mortality is presented by the mortality table of the United States. The semiparametric estimation method based on the EM algorithm for the MCPH model with background mortality (MCPH+BM) is further developed and validated via comprehensive simulation studies. Real data analysis shows that the proposed semiparametric MCPH+BM model may provide more accurate estimation in population-level cancer study.
Collapse
Affiliation(s)
- Ennan Gu
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Jiajia Zhang
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, South Carolina, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Lianming Wang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | | |
Collapse
|
12
|
Sun Y, Qi L, Heng F, Gilbert PB. A Hybrid Approach for the Stratified Mark-Specific Proportional Hazards Model with Missing Covariates and Missing Marks, with Application to Vaccine Efficacy Trials. J R Stat Soc Ser C Appl Stat 2020; 69:791-814. [PMID: 33191955 DOI: 10.1111/rssc.12417] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Deployment of the recently licensed CYD-TDV dengue vaccine requires understanding of how the risk of dengue disease in vaccine recipients depends jointly on a host biomarker measured after vaccination (neutralization titer - NAb) and on a "mark" feature of the dengue disease failure event (the amino acid sequence distance of the dengue virus to the dengue sequence represented in the vaccine). The CYD14 phase 3 trial of CYD-TDV measured NAb via case-cohort sampling and the mark in dengue disease failure events, with about a third missing marks. We addressed the question of interest by developing inferential procedures for the stratified mark-specific proportional hazards model with missing covariates and missing marks. Two hybrid approaches are investigated that leverage both augmented inverse probability weighting and nearest neighborhood hot deck multiple imputation. The two approaches differ in how the imputed marks are pooled in estimation. Our investigation shows that NNHD imputation can lead to biased estimation without properly selected neighborhood. Simulations show that the developed hybrid methods perform well with unbiased NNHD imputations from proper neighborhood selection. The new methods applied to CYD14 show that NAb is strongly inversely associated with risk of dengue disease in vaccine recipients, more strongly against dengue viruses with shorter distances.
Collapse
Affiliation(s)
- Yanqing Sun
- University of North Carolina at Charlotte, Charlotte, U.S.A
| | - Li Qi
- Sanofi, Bridgewater, U.S.A
| | - Fei Heng
- University of North Florida, Jacksonville, U.S.A
| | - Peter B Gilbert
- University of Washington and Fred Hutchinson Cancer Research Center, Seattle, U.S.A
| |
Collapse
|
13
|
Wu H, Wang L. Normal frailty probit model for clustered interval-censored failure time data. Biom J 2019; 61:827-840. [PMID: 30838687 DOI: 10.1002/bimj.201800114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Revised: 08/09/2018] [Accepted: 09/28/2018] [Indexed: 11/07/2022]
Abstract
Clustered interval-censored data commonly arise in many studies of biomedical research where the failure time of interest is subject to interval-censoring and subjects are correlated for being in the same cluster. A new semiparametric frailty probit regression model is proposed to study covariate effects on the failure time by accounting for the intracluster dependence. Under the proposed normal frailty probit model, the marginal distribution of the failure time is a semiparametric probit model, the regression parameters can be interpreted as both the conditional covariate effects given frailty and the marginal covariate effects up to a multiplicative constant, and the intracluster association can be summarized by two nonparametric measures in simple and explicit form. A fully Bayesian estimation approach is developed based on the use of monotone splines for the unknown nondecreasing function and a data augmentation using normal latent variables. The proposed Gibbs sampler is straightforward to implement since all unknowns have standard form in their full conditional distributions. The proposed method performs very well in estimating the regression parameters as well as the intracluster association, and the method is robust to frailty distribution misspecifications as shown in our simulation studies. Two real-life data sets are analyzed for illustration.
Collapse
Affiliation(s)
| | - Lianming Wang
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
14
|
Castro LM, Wang WL, Lachos VH, Inácio de Carvalho V, Bayes CL. Bayesian semiparametric modeling for HIV longitudinal data with censoring and skewness. Stat Methods Med Res 2018; 28:1457-1476. [PMID: 29551086 DOI: 10.1177/0962280218760360] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In biomedical studies, the analysis of longitudinal data based on Gaussian assumptions is common practice. Nevertheless, more often than not, the observed responses are naturally skewed, rendering the use of symmetric mixed effects models inadequate. In addition, it is also common in clinical assays that the patient's responses are subject to some upper and/or lower quantification limit, depending on the diagnostic assays used for their detection. Furthermore, responses may also often present a nonlinear relation with some covariates, such as time. To address the aforementioned three issues, we consider a Bayesian semiparametric longitudinal censored model based on a combination of splines, wavelets, and the skew-normal distribution. Specifically, we focus on the use of splines to approximate the general mean, wavelets for modeling the individual subject trajectories, and on the skew-normal distribution for modeling the random effects. The newly developed method is illustrated through simulated data and real data concerning AIDS/HIV viral loads.
Collapse
Affiliation(s)
- Luis M Castro
- 1 Department of Statistics, Pontificia Universidad Católica de Chile, Chile
| | - Wan-Lun Wang
- 2 Department of Statistics, Graduate Institute of Statistics and Actuarial Science, Feng Chia University, Taichung, Taiwan
| | - Victor H Lachos
- 3 Department of Statistics, University of Connecticut, Storrs, CT, USA
| | | | - Cristian L Bayes
- 5 Department of Sciences, Pontificia Universidad Católica del Perú, Lima, Perú
| |
Collapse
|
15
|
Abstract
Background A recent focus in the health sciences has been the development of personalized medicine, which includes determining the population for which a given treatment is effective. Due to limited data, identifying the true benefiting population is a challenging task. To tackle this difficulty, the credible subgroups approach provides a pair of bounding subgroups for the true benefiting subgroup, constructed so that one is contained by the benefiting subgroup while the other contains the benefiting subgroup with high probability. However, the method has so far only been developed for parametric linear models. Methods In this article, we develop the details required to follow the credible subgroups approach in more realistic settings by considering nonlinear and semiparametric regression models, supported for regulatory science by conditional power simulations. We also present an improved multiple testing approach using a step-down procedure. We evaluate our approach via simulations and apply it to data from four trials of Alzheimer's disease treatments carried out by AbbVie. Results Semiparametric modeling yields credible subgroups that are more robust to violations of linear treatment effect assumptions, and careful choice of the population of interest as well as the step-down multiple testing procedure result in a higher rate of detection of benefiting types of patients. The approach allows us to identify types of patients that benefit from treatment in the Alzheimer's disease trials. Conclusion Attempts to identify benefiting subgroups of patients in clinical trials are often met with skepticism due to a lack of multiplicity control and unrealistically restrictive assumptions. Our proposed approach merges two techniques, credible subgroups, and semiparametric regression, which avoids these problems and makes benefiting subgroup identification practical and reliable.
Collapse
Affiliation(s)
- Patrick M Schnell
- 1 Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, USA
| | - Peter Müller
- 2 Department of Mathematics, The University of Texas at Austin, Austin, TX, USA
| | - Qi Tang
- 3 Former employee of AbbVie, AbbVie, North Chicago, IL, USA
- 4 Sanofi, Bridgewater, NJ, USA
| | - Bradley P Carlin
- 5 Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
16
|
Hefley TJ, Broms KM, Brost BM, Buderman FE, Kay SL, Scharf HR, Tipton JR, Williams PJ, Hooten MB. The basis function approach for modeling autocorrelation in ecological data. Ecology 2017; 98:632-646. [PMID: 27935640 DOI: 10.1002/ecy.1674] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Revised: 10/18/2016] [Accepted: 10/24/2016] [Indexed: 11/07/2022]
Abstract
Analyzing ecological data often requires modeling the autocorrelation created by spatial and temporal processes. Many seemingly disparate statistical methods used to account for autocorrelation can be expressed as regression models that include basis functions. Basis functions also enable ecologists to modify a wide range of existing ecological models in order to account for autocorrelation, which can improve inference and predictive accuracy. Furthermore, understanding the properties of basis functions is essential for evaluating the fit of spatial or time-series models, detecting a hidden form of collinearity, and analyzing large data sets. We present important concepts and properties related to basis functions and illustrate several tools and techniques ecologists can use when modeling autocorrelation in ecological data.
Collapse
Affiliation(s)
- Trevor J Hefley
- Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, Colorado 80523 USA.,Department of Statistics, Colorado State University, Fort Collins, Colorado 80523 USA
| | - Kristin M Broms
- Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, Colorado 80523 USA
| | - Brian M Brost
- Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, Colorado 80523 USA
| | - Frances E Buderman
- Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, Colorado 80523 USA
| | - Shannon L Kay
- Department of Statistics, Colorado State University, Fort Collins, Colorado 80523 USA
| | - Henry R Scharf
- Department of Statistics, Colorado State University, Fort Collins, Colorado 80523 USA
| | - John R Tipton
- Department of Statistics, Colorado State University, Fort Collins, Colorado 80523 USA
| | - Perry J Williams
- Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, Colorado 80523 USA.,Department of Statistics, Colorado State University, Fort Collins, Colorado 80523 USA
| | - Mevin B Hooten
- Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, Colorado 80523 USA.,Department of Statistics, Colorado State University, Fort Collins, Colorado 80523 USA.,U.S. Geological Survey, Colorado Cooperative Fish and Wildlife Research Unit, Fort Collins, Colorado 80523 USA
| |
Collapse
|
17
|
Abstract
The Net Reclassification Improvement (NRI) and the Integrated Discrimination Improvement (IDI) are used to evaluate the diagnostic accuracy improvement for biomarkers in a wide range of applications. Most applications for these reclassification metrics are confined to nested model comparison. We emphasize the important extensions of these metrics to the non-nested comparison. Non-nested models are important in practice, in particular, in high-dimensional data analysis and in sophisticated semiparametric modeling. We demonstrate that the assessment of accuracy improvement may follow the familiar NRI and IDI evaluation. While the statistical properties of the estimators for NRI and IDI have been well studied in the nested setting, one cannot always rely on these asymptotic results to implement the inference procedure for practical data, especially for testing the null hypothesis of no improvement, and these properties have not been established for the non-nested setting. We propose a generic bootstrap re-sampling procedure for the construction of confidence intervals and hypothesis tests. Extensive simulations and real biomedical data examples illustrate the applicability of the proposed inference methods for both nested and non-nested models.
Collapse
Affiliation(s)
- Fang Shao
- a Department of Epidemiology and Biostatistics , Nanjing Medical University , Nanjing , People's Republic of China
| | | | | | | | | |
Collapse
|
18
|
Lee CYY, Wand MP. Variational methods for fitting complex Bayesian mixed effects models to health data. Stat Med 2016; 35:165-88. [PMID: 26415742 DOI: 10.1002/sim.6737] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Revised: 08/05/2015] [Accepted: 08/27/2015] [Indexed: 11/10/2022]
Abstract
We consider approximate inference methods for Bayesian inference to longitudinal and multilevel data within the context of health science studies. The complexity of these grouped data often necessitates the use of sophisticated statistical models. However, the large size of these data can pose significant challenges for model fitting in terms of computational speed and memory storage. Our methodology is motivated by a study that examines trends in cesarean section rates in the largest state of Australia, New South Wales, between 1994 and 2010. We propose a group-specific curve model that encapsulates the complex nonlinear features of the overall and hospital-specific trends in cesarean section rates while taking into account hospital variability over time. We use penalized spline-based smooth functions that represent trends and implement a fully mean field variational Bayes approach to model fitting. Our mean field variational Bayes algorithms allow a fast (up to the order of thousands) and streamlined analytical approximate inference for complex mixed effects models, with minor degradation in accuracy compared with the standard Markov chain Monte Carlo methods.
Collapse
Affiliation(s)
- Cathy Yuen Yi Lee
- School of Mathematical and Physical Sciences, University of Technology Sydney, Ultimo, New South Wales, 2007, Australia
| | - Matt P Wand
- School of Mathematical and Physical Sciences, University of Technology Sydney, Ultimo, New South Wales, 2007, Australia
| |
Collapse
|
19
|
Szczesniak RD, Li D, Amin RS. Semiparametric Mixed Models for Nested Repeated Measures Applied to Ambulatory Blood Pressure Monitoring Data. J Mod Appl Stat Methods 2016; 15:255-275. [PMID: 28936131 DOI: 10.22237/jmasm/1462075980] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Semiparametric mixed models are increasingly popular for statistical analysis of medical device studies in which long sequences of repeated measurements are recorded. Monitoring these sequences at different periods over time on the same individual, such as before and after an intervention, results in nested repeated measures (NRM). Covariance models to account for NRM and simultaneously address mean profile estimation with penalized splines via semiparametric regression are considered with application to a prospective study of 24-hour ambulatory blood pressure and the impact of surgical intervention on obstructive sleep apnea.
Collapse
Affiliation(s)
| | - Dan Li
- University of Cincinnati, Cincinnati, OH
| | | |
Collapse
|
20
|
Abstract
Health care utilization is an outcome of interest in health services research. Two frequently studied forms of utilization are counts of emergency department (ED) visits and hospital admissions. These counts collectively convey a sense of disease exacerbation and cost escalation. Different types of event counts from the same patient form a vector of correlated outcomes. Traditional analysis typically model such outcomes one at a time, ignoring the natural correlations between different events, and thus failing to provide a full picture of patient care utilization. In this research, we propose a multivariate semiparametric modeling framework for the analysis of multiple health care events following the exponential family of distributions in a longitudinal setting. Bivariate nonparametric functions are incorporated to assess the concurrent nonlinear influences of independent variables as well as their interaction effects on the outcomes. The smooth functions are estimated using the thin plate regression splines. A maximum penalized likelihood method is used for parameter estimation. The performance of the proposed method was evaluated through simulation studies. To illustrate the method, we analyzed data from a clinical trial in which ED visits and hospital admissions were considered as bivariate outcomes.
Collapse
Affiliation(s)
- Zhuokai Li
- 1 Duke Clinical Research Institute, Durham, NC, USA
| | - Hai Liu
- 2 Gilead Sciences, Inc., Foster City, CA, USA
| | - Wanzhu Tu
- 3 Department of Biostatistics, Indiana University Center for Aging Research, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
21
|
Chen J, Liu L, Shih YCT, Zhang D, Severini TA. A flexible model for correlated medical costs, with application to medical expenditure panel survey data. Stat Med 2015; 35:883-94. [PMID: 26403805 DOI: 10.1002/sim.6743] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 08/25/2015] [Accepted: 08/31/2015] [Indexed: 11/07/2022]
Abstract
We propose a flexible model for correlated medical cost data with several appealing features. First, the mean function is partially linear. Second, the distributional form for the response is not specified. Third, the covariance structure of correlated medical costs has a semiparametric form. We use extended generalized estimating equations to simultaneously estimate all parameters of interest. B-splines are used to estimate unknown functions, and a modification to Akaike information criterion is proposed for selecting knots in spline bases. We apply the model to correlated medical costs in the Medical Expenditure Panel Survey dataset. Simulation studies are conducted to assess the performance of our method.
Collapse
Affiliation(s)
- Jinsong Chen
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, U.S.A
| | - Lei Liu
- Department of Preventive Medicine, Northwestern University, Chicago, IL, U.S.A
| | - Ya-Chen T Shih
- Department of Medicine, The University of Chicago, Chicago, IL, U.S.A
| | - Daowen Zhang
- Department of Statistics, North Carolina State University, Raleigh, NC, U.S.A
| | - Thomas A Severini
- Department of Statistics, Northwestern University, Evanston, IL, U.S.A
| |
Collapse
|
22
|
Hobbs BP, Ng CS. Inferring Stable Acquisition Durations for Applications of Perfusion Imaging in Oncology. Cancer Inform 2015; 14:193-9. [PMID: 26052222 PMCID: PMC4444141 DOI: 10.4137/cin.s17280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Revised: 04/08/2015] [Accepted: 04/11/2015] [Indexed: 02/07/2023] Open
Abstract
Tissue perfusion plays a critical role in oncology. Growth and migration of cancerous cells requires proliferation of networks of new blood vessels through the process of tumor angiogenesis. Many imaging technologies developed recently attempt to measure characteristics pertaining to the passage of fluid through blood vessels, thereby providing a noninvasive means for cancer detection, as well as treatment prognostication, prediction, and monitoring. However, because these techniques require a sequence of successive imaging scans under administration of intravenous imaging tracers, the quality of the resulting perfusion data depends on the acquisition protocol. In this paper, we explain how to infer stability for stochastic curve estimation. The topic is motivated by two recent attempts to determine stable acquisition durations for acquiring perfusion characteristics using dynamic computed tomography, wherein inference used inappropriate statistical methods. Notably, when appropriate statistical techniques are used, the resulting conclusions deviate substantially from those previously reported in the literature.
Collapse
Affiliation(s)
- Brian P Hobbs
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Chaan S Ng
- Department of Diagnostic Radiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
23
|
Abstract
Interval-censored time-to-event data often occur in studies of diseases where the symptoms of interest are not directly observable but require lab examinations for detection. Furthermore, the independence assumption among observations may not be valid if they are from clusters. Some methods have been developed for analysing clustered interval-censored data with a shared frailty to account for overall heterogeneity. In this paper, we propose a multiple frailty proportional hazards model, where we not only account for the baseline heterogeneity and effect variation across clusters for predictors, but also quantify the probabilities of the existence of such frailties. This proposed model will be especially useful for analysing multi-center randomised clinical trials for HIV, infections or progression-free survival in oncology studies.
Collapse
Affiliation(s)
- Chun Pan
- 1 Novartis Pharmaceuticals Corporation, East Hanover, NJ, USA
| | - Bo Cai
- 2 Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| | - Lianming Wang
- 3 Department of Statistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
24
|
Abstract
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis.
Collapse
|
25
|
Wand H, Ramjee G. Evaluating HIV prevention efforts using semiparametric regression models: results from a large cohort of women participating in an HIV prevention trial from KwaZulu-Natal, South Africa. J Int AIDS Soc 2013; 16:18589. [PMID: 24280372 PMCID: PMC3841298 DOI: 10.7448/ias.16.1.18589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Revised: 09/08/2013] [Accepted: 10/02/2013] [Indexed: 11/08/2022] Open
Abstract
OBJECTIVE To describe and quantify the differences in risk behaviours, HIV prevalence and incidence rates by birth cohorts among a group of women in Durban, South Africa. METHODS Cross-sectional and prospective cohort analyses were conducted for women who consented to be screened and enrolled in an HIV prevention trial. Demographic and sexual behaviours were described by five-year birth cohorts. Semiparametric regression models were used to investigate the bivariate associations between these factors and the birth cohorts. HIV seroconversion rates were also estimated by birth cohorts. RESULTS The prevalence of HIV-1 infection at the screening visit was lowest (20.0%) among the oldest (born before 1960) cohorts, while the highest prevalence was observed among those born between 1975 and 79. Level of education increased across the birth cohorts while the median age at first sexual experience declined among those born after 1975 compared to those born before 1975. Only 33.03% of the oldest group reported ever using a condom while engaging in vaginal sex compared to 73.68% in the youngest group; however, HIV and other sexually transmitted infection (STI) incidence rates were significantly higher among younger women compared to older women. CONCLUSIONS These findings clearly suggest that demographic and sexual risk behaviours are differentially related to the birth cohorts. Significantly high HIV and STI incidence rates were observed among the younger group. Although the level of education increased, early age at sexual debut was more common among the younger group. The continuing increase in HIV and STI incidence rates among the later cohorts suggests that the future trajectory of the epidemic will be dependent on the infection patterns in younger birth cohorts.
Collapse
Affiliation(s)
- Handan Wand
- The Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia
| | - Gita Ramjee
- HIV Prevention Research Unit, Medical Research Council, Durban, South Africa
| |
Collapse
|
26
|
Chen J, Liu L, Zhang D, Shih YCT. A flexible model for the mean and variance functions, with application to medical cost data. Stat Med 2013; 32:4306-18. [PMID: 23670952 PMCID: PMC4669967 DOI: 10.1002/sim.5838] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Revised: 03/24/2013] [Accepted: 03/27/2013] [Indexed: 11/09/2022]
Abstract
Medical cost data are often skewed to the right and heteroscedastic, having a nonlinear relation with covariates. To tackle these issues, we consider an extension to generalized linear models by assuming nonlinear associations of covariates in the mean function and allowing the variance to be an unknown but smooth function of the mean. We make no further assumption on the distributional form. The unknown functions are described by penalized splines, and the estimation is carried out using nonparametric quasi-likelihood. Simulation studies show the flexibility and advantages of our approach. We apply the model to the annual medical costs of heart failure patients in the clinical data repository at the University of Virginia Hospital System.
Collapse
Affiliation(s)
- Jinsong Chen
- Department of Preventive Medicine, Northwestern University, Chicago, U.S.A
| | - Lei Liu
- Department of Preventive Medicine, Northwestern University, Chicago, U.S.A
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, U.S.A
| | - Daowen Zhang
- Department of Statistics, North Carolina State University, Raleigh, NC, U.S.A
| | - Ya-Chen T. Shih
- Department of Medicine, University of Chicago, Chicago, IL, U.S.A
| |
Collapse
|
27
|
McMahan CS, Wang L, Tebbs JM. Regression analysis for current status data using the EM algorithm. Stat Med 2013; 32:4452-66. [PMID: 23761135 DOI: 10.1002/sim.5863] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 05/06/2013] [Indexed: 11/12/2022]
Abstract
We propose new expectation-maximization algorithms to analyze current status data under two popular semiparametric regression models: the proportional hazards (PH) model and the proportional odds (PO) model. Monotone splines are used to model the baseline cumulative hazard function in the PH model and the baseline odds function in the PO model. The proposed algorithms are derived by exploiting a data augmentation based on Poisson latent variables. Unlike previous regression work with current status data, our PH and PO model fitting methods are fast, flexible, easy to implement, and provide variance estimates in closed form. These techniques are evaluated using simulation and are illustrated using uterine fibroid data from a prospective cohort study on early pregnancy.
Collapse
|
28
|
Yu Z, Liu L, Bravata DM, Williams LS, Tepper RS. A semiparametric recurrent events model with time-varying coefficients. Stat Med 2013; 32:1016-26. [PMID: 22903343 PMCID: PMC4641519 DOI: 10.1002/sim.5575] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 07/28/2012] [Indexed: 11/07/2022]
Abstract
We consider a recurrent events model with time-varying coefficients motivated by two clinical applications. We use a random effects (Gaussian frailty) model to describe the intensity of recurrent events. The model can accommodate both time-varying and time-constant coefficients. We use the penalized spline method to estimate the time-varying coefficients. We use Laplace approximation to evaluate the penalized likelihood without a closed form. We estimate the smoothing parameters in a similar way to variance components. We conduct simulations to evaluate the performance of the estimates for both time-varying and time-independent coefficients. We apply this method to analyze two data sets: a stroke study and a child wheeze study.
Collapse
Affiliation(s)
- Zhangsheng Yu
- Department of Biostatistics, Indiana University School of Medicine, Indianapolis, IN, USA.
| | | | | | | | | |
Collapse
|
29
|
Collier BA, Groce JE, Morrison ML, Newnam JC, Campomizzi AJ, Farrell SL, Mathewson HA, Snelgrove RT, Carroll RJ, Wilkins RN. Predicting patch occupancy in fragmented landscapes at the rangewide scale for an endangered species: an example of an American warbler. DIVERS DISTRIB 2012; 18:158-167. [PMID: 22408381 PMCID: PMC3298116 DOI: 10.1111/j.1472-4642.2011.00831.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
AIM: Our objective was to identify the distribution of the endangered golden-cheeked warbler (Setophaga chrysoparia) in fragmented oak-juniper woodlands by applying a geoadditive semiparametric occupancy model to better assist decision-makers in identifying suitable habitat across the species breeding range on which conservation or mitigation activities can be focused and thus prioritize management and conservation planning. LOCATION: Texas, USA. METHODS: We used repeated double-observer detection/non-detection surveys of randomly selected (n = 287) patches of potential habitat to evaluate warbler patch-scale presence across the species breeding range. We used a geoadditive semiparametric occupancy model with remotely sensed habitat metrics (patch size and landscape composition) to predict patch-scale occupancy of golden-cheeked warblers in the fragmented oak-juniper woodlands of central Texas, USA. RESULTS: Our spatially explicit model indicated that golden-cheeked warbler patch occupancy declined from south to north within the breeding range concomitant with reductions in the availability of large habitat patches. We found that 59% of woodland patches, primarily in the northern and central portions of the warbler's range, were predicted to have occupancy probabilities ≤0.10 with only 3% of patches predicted to have occupancy probabilities >0.90. Our model exhibited high prediction accuracy (area under curve = 0.91) when validated using independently collected warbler occurrence data. MAIN CONCLUSIONS: We have identified a distinct spatial occurrence gradient for golden-cheeked warblers as well as a relationship between two measurable landscape characteristics. Because habitat-occupancy relationships were key drivers of our model, our results can be used to identify potential areas where conservation actions supporting habitat mitigation can occur and identify areas where conservation of future potential habitat is possible. Additionally, our results can be used to focus resources on maintenance and creation of patches that are more likely to harbour viable local warbler populations.
Collapse
Affiliation(s)
- Bret A. Collier
- Institute of Renewable Natural Resources, Texas A and M University, College Station, TX 77843, USA
| | - Julie E. Groce
- Institute of Renewable Natural Resources, Texas A and M University, San Antonio, TX 77843, USA
| | - Michael L. Morrison
- Department of Wildlife and Fisheries Sciences, Texas A and M University, College Station, TX 77843, USA
| | - John C. Newnam
- Texas Department of Transportation, PO Box 15426, Austin, TX 78761, USA
| | - Andrew J. Campomizzi
- Department of Wildlife and Fisheries Sciences, Texas A and M University, College Station, TX 77843, USA
| | - Shannon L. Farrell
- Department of Wildlife and Fisheries Sciences, Texas A and M University, College Station, TX 77843, USA
| | - Heather A. Mathewson
- Institute of Renewable Natural Resources, Texas A and M University, College Station, TX 77843, USA
| | - Robert T. Snelgrove
- Institute of Renewable Natural Resources, Texas A and M University, College Station, TX 77843, USA
| | - Raymond J. Carroll
- Department of Statistics, Texas A and M University, College Station, TX 77843, USA
| | - Robert N. Wilkins
- Institute of Renewable Natural Resources, Texas A and M University, College Station, TX 77843, USA
| |
Collapse
|
30
|
Abstract
In many biomedical investigations, a primary goal is the identification of subjects who are susceptible to a given exposure or treatment of interest. We focus on methods for addressing this question in longitudinal studies when interest focuses on relating susceptibility to a subject's baseline or mean outcome level. In this context, we propose a random intercepts-functional slopes model that relaxes the assumption of linear association between random coefficients in existing mixed models and yields an estimate of the functional form of this relationship. We propose a penalized spline formulation for the nonparametric function that represents this relationship, and implement a fully Bayesian approach to model fitting. We investigate the frequentist performance of our method via simulation, and apply the model to data on the effects of particulate matter on coronary blood flow from an animal toxicology study. The general principles introduced here apply more broadly to settings in which interest focuses on the relationship between baseline and change over time.
Collapse
Affiliation(s)
- Brent A Coull
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, USA.
| |
Collapse
|
31
|
Su L, Hogan JW. HIV DYNAMICS AND NATURAL HISTORY STUDIES: JOINT MODELING WITH DOUBLY INTERVAL-CENSORED EVENT TIME AND INFREQUENT LONGITUDINAL DATA. Ann Appl Stat 2011; 5:400-426. [PMID: 27134691 DOI: 10.1214/10-aoas391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Hepatitis C virus (HCV) coinfection has become one of the most challenging clinical situations to manage in HIV-infected patients. Recently the effect of HCV coinfection on HIV dynamics following initiation of highly active antiretroviral therapy (HAART) has drawn considerable attention. Post-HAART HIV dynamics are commonly studied in short-term clinical trials with frequent data collection design. For example, the elimination process of plasma virus during treatment is closely monitored with daily assessments in viral dynamics studies of AIDS clinical trials. In this article instead we use infrequent cohort data from long-term natural history studies and develop a model for characterizing post-HAART HIV dynamics and their associations with HCV coinfection. Specifically, we propose a joint model for doubly interval-censored data for the time between HAART initiation and viral suppression, and the longitudinal CD4 count measurements relative to the viral suppression. Inference is accomplished using a fully Bayesian approach. Doubly interval-censored data are modeled semiparametrically by Dirichlet process priors and Bayesian penalized splines are used for modeling population-level and individual-level mean CD4 count profiles. We use the proposed methods and data from the HIV Epidemiology Research Study (HERS) to investigate the effect of HCV coinfection on the response to HAART.
Collapse
Affiliation(s)
- Li Su
- MRC Biostatistics Unit, Robinson Way, Cambridge CB2 0SR, UK,
| | - Joseph W Hogan
- Center for Statistical Sciences, Department of Community Health, Brown University, Box G-S121-7, Providence, Rhode Island 02912, USA,
| |
Collapse
|
32
|
CALDERON CHRISTOPHERP, MARTINEZ JOSUEG, CARROLL RAYMONDJ, SORENSEN DANNYC. P-SPLINES USING DERIVATIVE INFORMATION. Multiscale Model Simul 2010; 8:1562-1580. [PMID: 21691592 PMCID: PMC3117255 DOI: 10.1137/090768102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Time series associated with single-molecule experiments and/or simulations contain a wealth of multiscale information about complex biomolecular systems. We demonstrate how a collection of Penalized-splines (P-splines) can be useful in quantitatively summarizing such data. In this work, functions estimated using P-splines are associated with stochastic differential equations (SDEs). It is shown how quantities estimated in a single SDE summarize fast-scale phenomena, whereas variation between curves associated with different SDEs partially reflects noise induced by motion evolving on a slower time scale. P-splines assist in "semiparametrically" estimating nonlinear SDEs in situations where a time-dependent external force is applied to a single-molecule system. The P-splines introduced simultaneously use function and derivative scatterplot information to refine curve estimates. We refer to the approach as the PuDI (P-splines using Derivative Information) method. It is shown how generalized least squares ideas fit seamlessly into the PuDI method. Applications demonstrating how utilizing uncertainty information/approximations along with generalized least squares techniques improve PuDI fits are presented. Although the primary application here is in estimating nonlinear SDEs, the PuDI method is applicable to situations where both unbiased function and derivative estimates are available.
Collapse
Affiliation(s)
| | - JOSUE G. MARTINEZ
- Department of Statistics, Texas A&M University, College Station, TX 77843
| | - RAYMOND J. CARROLL
- Department of Statistics, Texas A&M University, College Station, TX 77843
| | - DANNY C. SORENSEN
- Department of Computational and Applied Mathematics, Rice University, Houston, TX 77005
| |
Collapse
|