1
|
Kamyari N, Soltanian AR, Mahjub H, Moghimbeigi A, Seyedtabib M. Zero-augmented beta-prime model for multilevel semi-continuous data: a Bayesian inference. BMC Med Res Methodol 2022; 22:283. [PMID: 36324066 PMCID: PMC9628168 DOI: 10.1186/s12874-022-01736-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/27/2022] [Indexed: 01/24/2023] Open
Abstract
Semi-continuous data characterized by an excessive proportion of zeros and right-skewed continuous positive values appear frequently in medical research. One example would be the pharmaceutical expenditure (PE) data for which a substantial proportion of subjects investigated may report zero. Two-part mixed-effects models have been developed to analyse clustered measures of semi-continuous data from multilevel studies. In this study, we propose a new flexible two-part mixed-effects model with skew distributions for nested semi-continuous cost data under the framework of a Bayesian approach. The proposed model specification consists of two mixed-effects models linked by the correlated random effects: Part I) a model on the occurrence of positive values using a generalized logistic mixed model; and Part II) a model on the magnitude of positive values using a linear mixed model where the model errors follow skew distributions including beta-prime (BP). The proposed method is illustrated with pharmaceutical expenditure data from a multilevel observational study and the analytic results are reported by comparing potential models under different skew distributions. Simulation studies are conducted to assess the performance of the proposed model. The DIC3, LPML, WAIC, and LOO as the Bayesian model selection criteria and measures of divergence used to compare the models.
Collapse
Affiliation(s)
- Naser Kamyari
- Department of Biostatistics and Epidemiology, School of Health, Abadan University of Medical Sciences, Abadan, Iran
| | - Ali Reza Soltanian
- grid.411950.80000 0004 0611 9280Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Street of Mahdieh, Hamadan, Iran
| | - Hossein Mahjub
- grid.411950.80000 0004 0611 9280Research Center for Health Sciences, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Abbas Moghimbeigi
- grid.411705.60000 0001 0166 0922Department of Biostatistics and Epidemiology, School of Health, Research Center for Health, Safety and Environment, Alborz University of Medical Sciences, Karaj, Iran
| | - Maryam Seyedtabib
- grid.411230.50000 0000 9296 6873Department of Biostatistics and Epidemiology, School of Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| |
Collapse
|
2
|
Ren J, Tapert S, Fan CC, Thompson WK. A semi-parametric Bayesian model for semi-continuous longitudinal data. Stat Med 2022; 41:2354-2374. [PMID: 35274335 DOI: 10.1002/sim.9359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 01/21/2022] [Accepted: 02/03/2022] [Indexed: 11/11/2022]
Abstract
Semi-continuous data present challenges in both model fitting and interpretation. Parametric distributions may be inappropriate for extreme long right tails of the data. Mean effects of covariates, susceptible to extreme values, may fail to capture relevant information for most of the sample. We propose a two-component semi-parametric Bayesian mixture model, with the discrete component captured by a probability mass (typically at zero) and the continuous component of the density modeled by a mixture of B-spline densities that can be flexibly fit to any data distribution. The model includes random effects of subjects to allow for application to longitudinal data. We specify prior distributions on parameters and perform model inference using a Markov chain Monte Carlo (MCMC) Gibbs-sampling algorithm programmed in R. Statistical inference can be made for multiple quantiles of the covariate effects simultaneously providing a comprehensive view. Various MCMC sampling techniques are used to facilitate convergence. We demonstrate the performance and the interpretability of the model via simulations and analyses on the National Consortium on Alcohol and Neurodevelopment in Adolescence study (NCANDA) data on alcohol binge drinking.
Collapse
Affiliation(s)
- Junting Ren
- Division of Biostatistics, Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, California, USA.,Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA
| | - Susan Tapert
- Department of Psychiatry, University of California San Diego, La Jolla, California, USA
| | - Chun Chieh Fan
- Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA.,Center for Human Development, University of California San Diego, La Jolla, California, USA
| | - Wesley K Thompson
- Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA.,Department of Radiology, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
3
|
Jaffa MA, Gebregziabher M, Jaffa AA. Shared parameter and copula models for analysis of semicontinuous longitudinal data with nonrandom dropout and informative censoring. Stat Methods Med Res 2022; 31:451-474. [PMID: 34806502 PMCID: PMC8891057 DOI: 10.1177/09622802211060519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Analysis of longitudinal semicontinuous data characterized by subjects' attrition triggered by nonrandom dropout is complex and requires accounting for the within-subject correlation, and modeling of the dropout process. While methods that address the within-subject correlation and missing data are available, approaches that incorporate the nonrandom dropout, also referred to informative right censoring, in the modeling step are scarce due to the computational intensity and possible intractable integration needed for its implementation. Appreciating the complexity of this problem and the need for a new methodology that is feasible for implementation, we propose to extend a framework of likelihood-based marginalized two-part models to account for informative right censoring. The censoring process is modeled using two approaches: (1) Poisson censoring for the count of visits before dropout and (2) survival time to dropout. Novel consideration was given to the proposed joint modeling approaches for the semicontinuous and censoring components of the likelihood function which included (1) shared parameter, and (2) Clayton copula. The cross-part and within-part correlations were accounted for through a complex random effect structure that models correlated random intercepts and slopes. Feasibility of implementation, and accuracy of these approaches were investigated using extensive simulation studies and clinical application.
Collapse
Affiliation(s)
- Miran A. Jaffa
- Epidemiology and Population Health Department, Faculty of Health Sciences, American University of Beirut, Beirut, Lebanon, P.O.Box 11-0236 Riad El-Solh / Beirut, Lebanon 1107 2020
| | - Mulugeta Gebregziabher
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC. USA
| | - Ayad A. Jaffa
- Department of Biochemistry and Molecular Genetics, Faculty of Medicine, American University of Beirut, Beirut, Lebanon, P.O.Box 11-0236 Riad El-Solh / Beirut, Lebanon 1107 2020
- Department of Medicine, Medical University of South Carolina, Charleston, SC 29425, USA
| |
Collapse
|
4
|
Feng T, Boyle LN. Sparse group regularization for semi-continuous transportation data. Stat Med 2021; 40:3267-3285. [PMID: 33843070 DOI: 10.1002/sim.8942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 01/17/2021] [Accepted: 02/16/2021] [Indexed: 11/08/2022]
Abstract
Motor vehicle crashes are a global public health concern. Most analysis have used zero-inflated count models for examining crash counts. However, few methods are available to account for safety metrics that have semi-continuous observations. This article considers the problem of variable selection for the semi-continuous zero-inflated (SCZI) models. These models include two parts: a zero-inflated part and a nonzero continuous part. A special group regularization is designed to accommodate the unique structure of two-part SCZI models, and a type of Bayesian information criterion is proposed to select tuning parameters. We illustrate the variable selection process of the proposed model using lane position data from a driving simulator study. In the study, drivers stay in the intended lane for the majority of their drive (zero-inflated part). On occasion, some drivers do drift out of their intended driving lane (nonzero continuous part). Our findings show that individual differences can be captured with the proposed model, which has implications for driving safety and the design of in-vehicle alerting systems.
Collapse
Affiliation(s)
- Tianshu Feng
- Industrial and Systems Engineering, University of Washington, Seattle, Washington, USA
| | - Linda Ng Boyle
- Industrial and Systems Engineering, University of Washington, Seattle, Washington, USA
| |
Collapse
|
5
|
Park J, Choi T, Chung Y. Nonparametric Bayesian functional two-part random effects model for longitudinal semicontinuous data analysis. Biom J 2021; 63:787-805. [PMID: 33554393 DOI: 10.1002/bimj.201900280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 04/23/2020] [Accepted: 07/17/2020] [Indexed: 11/08/2022]
Abstract
Longitudinal semicontinuous data, characterized by repeated measures of a large portion of zeros and continuous positive values, are frequently encountered in many applications including biomedical, epidemiological, and social science studies. Two-part random effects models (TPREM) have been used to investigate the association between such longitudinal semicontinuous data and covariates accounting for the within-subject correlation. The existing TPREM is, however, limited to incorporate a functional covariate, which is often available in a longitudinal study. Moreover, the existing TPREM typically assumes the normality of subject-specific random effects, which can be easily violated when there exists a subgroup structure. In this article, we propose a nonparametric Bayesian functional TPREM to assess the relationship between the longitudinal semicontinuous outcome and various types of covariates including a functional covariate. The proposed model also relaxes the normality assumption for the random effects through a Dirichlet process mixture of normals, which allows for identifying an underlying subgroup structure. The methodology is illustrated through an application to social insurance expenditure data collected by the Korean Welfare Panel Study and a simulation study.
Collapse
Affiliation(s)
- Jinsu Park
- Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Korea
| | - Taeryon Choi
- Department of Statistics, Korea University, Seoul, Korea
| | - Yeonseung Chung
- Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Korea
| |
Collapse
|
6
|
Xin Y, Jiang J, Chen S, Gong F, Xiang L. What contributes to medical debt? Evidence from patients in rural China. BMC Health Serv Res 2020; 20:696. [PMID: 32723325 PMCID: PMC7388505 DOI: 10.1186/s12913-020-05551-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 07/17/2020] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Rural households in developing countries usually have severe medical debt due to high out-of-pocket (OOP) payments, which contributes to bankruptcy. China implemented the critical illness insurance (CII) in 2012 to decrease patients' medical expenditure. This paper aimed to explore the medical debt of rural Chinese patients and its influencing factors. METHODS A questionnaire survey of health expenditures and medical debt was conducted in two counties of Central and Western China in 2017. Patients who received CII were used as the sample on the basis of multi-stage stratified cluster sampling. Descriptive statistics and multivariate analysis of variance were used in all data. A two-part model was used to evaluate the occurrence and extent of medical debt. RESULTS A total of 826 rural patients with CII were surveyed. The percentages of patients incurring medical debt exceeded 50% and the median debt load was 20,000 Chinese yuan (CNY, 650 CNY = US$100). Financial assistance from kin (P < 0.001) decreased the likelihood of medical debt. High inpatient expenses (IEs, P < 0.01), CII reimbursement ratio (P < 0.001), and non-direct medical costs (P < 0.001) resulted in increased medical debt load. CONCLUSIONS Medical debt is still one of the biggest problems in rural China. High IEs, CII reimbursement ratio, municipal or high-level hospitals were the risk determinants of medical debt load. Financial assistance from kin and household income were the protective factors. Increasing service capability of hospitals in counties could leave more patiemts in county-level and township hospitals. Improving CII with increased reimbursement rate may also be issues of concern.
Collapse
Affiliation(s)
- Yanjiao Xin
- School of Medicine and Health Management, Huazhong University of Science and Technology, 13 Hangkong Road, Qiaokou District, Wuhan, 430030, China
| | - Junnan Jiang
- School of Medicine and Health Management, Huazhong University of Science and Technology, 13 Hangkong Road, Qiaokou District, Wuhan, 430030, China
| | - Shanquan Chen
- School of Clinical Medicine, University of Cambridge, Cambridgeshire, UK
| | - Fangxu Gong
- School of Medicine and Health Management, Huazhong University of Science and Technology, 13 Hangkong Road, Qiaokou District, Wuhan, 430030, China
| | - Li Xiang
- School of Medicine and Health Management, Huazhong University of Science and Technology, 13 Hangkong Road, Qiaokou District, Wuhan, 430030, China.
| |
Collapse
|
7
|
Quantitative knowledge presentation models of traditional Chinese medicine (TCM): A review. Artif Intell Med 2020; 103:101810. [DOI: 10.1016/j.artmed.2020.101810] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 01/11/2020] [Accepted: 01/23/2020] [Indexed: 12/26/2022]
|
8
|
Jiang T, Lu Y, Duan H, Zhang W, Liu A. A model-based approach for clustering of multivariate semicontinuous data with application to dietary pattern analysis and intervention. Stat Med 2020; 39:16-25. [PMID: 31702055 DOI: 10.1002/sim.8391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 11/10/2022]
Abstract
Semicontinuous data, characterized by a sizable number of zeros and observations from a continuous distribution, are frequently encountered in health research concerning food consumptions, physical activities, medical and pharmacy claims expenditures, and many others. In analyzing such semicontinuous data, it is imperative that the excessive zeros be adequately accounted for to obtain unbiased and efficient inference. Although many methods have been proposed in the literature for the modeling and analysis of semicontinuous data, little attention has been given to clustering of semicontinuous data to identify important patterns that could be indicative of certain health outcomes or intervention effects. We propose a Bernoulli-normal mixture model for clustering of multivariate semicontinuous data and demonstrate its accuracy as compared to the well-known clustering method with the conventional normal mixture model. The proposed method is illustrated with data from a dietary intervention trial to promote healthy eating behavior among children with type 1 diabetes. In the trial, certain diabetes friendly foods (eg, total fruit, whole fruit, dark green and orange vegetables and legumes, whole grain) were only consumed by a proportion of study participants, yielding excessive zero values due to nonconsumption of the foods. Baseline foods consumptions data in the trial are used to explore preintervention dietary patterns among study participants. While the conventional normal mixture model approach fails to do so, the proposed Bernoulli-normal mixture model approach has shown to be able to identify a dietary profile that significantly differentiates the intervention effects from others, as measured by the popular healthy eating index at the end of the trial.
Collapse
Affiliation(s)
- Tao Jiang
- School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China
| | - Yahui Lu
- School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China
| | - Huimin Duan
- School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China
| | - Wei Zhang
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland
| | - Aiyi Liu
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland
| |
Collapse
|
9
|
Liu L, Shih YCT, Strawderman RL, Zhang D, Johnson BA, Chai H. Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review. Stat Sci 2019. [DOI: 10.1214/18-sts681] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
10
|
Albert PS. Shared random parameter models: A legacy of the biostatistics program at the National Heart, Lung, and Blood Institute. Stat Med 2019; 38:501-511. [PMID: 30376693 DOI: 10.1002/sim.8011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 08/19/2018] [Accepted: 09/26/2018] [Indexed: 11/07/2022]
Abstract
Shared random parameter models (SRPMs) were first introduced by researchers at the National Heart Lung and Blood Institute (NHLBI) Biostatistics Branch for analyzing longitudinal data with informative dropout (Wu and Carroll, 1987; Wu and Bailey, 1988; Follmann and Wu, 1995; Albert and Follmann, 2000; Albert et al, 2002). This work was all focused on characterizing the longitudinal data process in the presence of an informative missing data mechanism that is treated as a nuisance. Shared random parameter modeling approaches have also been developed from the perspective of characterizing the relationship between longitudinal data and a subsequent outcome that may be an event time, a dichotomous measurement, or another longitudinal outcome. This article will review the early contributions of the NHLBI biostatisticians on SRPMs for analyzing longitudinal data with dropout and demonstrate how these ideas have, more recently, been applied in these other areas of biostatistics. Rather than focus on technical details or specific analyses, this article presents a conceptual framework for SRPMs within a historical context.
Collapse
Affiliation(s)
- Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| |
Collapse
|
11
|
Two-Part Models for Zero-Modified Count and Semicontinuous Data. HEALTH SERVICES EVALUATION 2019. [DOI: 10.1007/978-1-4939-8715-3_39] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
12
|
Yiu S, Tom BDM. Two-part models with stochastic processes for modelling longitudinal semicontinuous data: Computationally efficient inference and modelling the overall marginal mean. Stat Methods Med Res 2018; 27:3679-3695. [PMID: 28535715 PMCID: PMC5723155 DOI: 10.1177/0962280217710573] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Several researchers have described two-part models with patient-specific stochastic processes for analysing longitudinal semicontinuous data. In theory, such models can offer greater flexibility than the standard two-part model with patient-specific random effects. However, in practice, the high dimensional integrations involved in the marginal likelihood (i.e. integrated over the stochastic processes) significantly complicates model fitting. Thus, non-standard computationally intensive procedures based on simulating the marginal likelihood have so far only been proposed. In this paper, we describe an efficient method of implementation by demonstrating how the high dimensional integrations involved in the marginal likelihood can be computed efficiently. Specifically, by using a property of the multivariate normal distribution and the standard marginal cumulative distribution function identity, we transform the marginal likelihood so that the high dimensional integrations are contained in the cumulative distribution function of a multivariate normal distribution, which can then be efficiently evaluated. Hence, maximum likelihood estimation can be used to obtain parameter estimates and asymptotic standard errors (from the observed information matrix) of model parameters. We describe our proposed efficient implementation procedure for the standard two-part model parameterisation and when it is of interest to directly model the overall marginal mean. The methodology is applied on a psoriatic arthritis data set concerning functional disability.
Collapse
Affiliation(s)
- Sean Yiu
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Brian DM Tom
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
13
|
Farewell VT, Long DL, Tom BDM, Yiu S, Su L. Two-Part and Related Regression Models for Longitudinal Data. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2017; 4:283-315. [PMID: 28890906 PMCID: PMC5590716 DOI: 10.1146/annurev-statistics-060116-054131] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Statistical models that involve a two-part mixture distribution are applicable in a variety of situations. Frequently, the two parts are a model for the binary response variable and a model for the outcome variable that is conditioned on the binary response. Two common examples are zero-inflated or hurdle models for count data and two-part models for semicontinuous data. Recently, there has been particular interest in the use of these models for the analysis of repeated measures of an outcome variable over time. The aim of this review is to consider motivations for the use of such models in this context and to highlight the central issues that arise with their use. We examine two-part models for semicontinuous and zero-heavy count data, and we also consider models for count data with a two-part random effects distribution.
Collapse
Affiliation(s)
- V T Farewell
- Medical Research Council Biostatistics Unit, Institute of Public Health, University of Cambridge, Cambridge CB2 0SR, United Kingdom
| | - D L Long
- Department of Biostatistics, West Virginia University, Morgantown, West Virginia 26506
| | - B D M Tom
- Medical Research Council Biostatistics Unit, Institute of Public Health, University of Cambridge, Cambridge CB2 0SR, United Kingdom
| | - S Yiu
- Medical Research Council Biostatistics Unit, Institute of Public Health, University of Cambridge, Cambridge CB2 0SR, United Kingdom
| | - L Su
- Medical Research Council Biostatistics Unit, Institute of Public Health, University of Cambridge, Cambridge CB2 0SR, United Kingdom
| |
Collapse
|
14
|
Two-Part Models for Zero-Modified Count and Semicontinuous Data. Health Serv Res 2017. [DOI: 10.1007/978-1-4939-6704-9_17-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
|
15
|
Neelon B, O'Malley AJ, Smith VA. Modeling zero-modified count and semicontinuous data in health services research Part 1: background and overview. Stat Med 2016; 35:5070-5093. [DOI: 10.1002/sim.7050] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Revised: 05/04/2016] [Accepted: 06/27/2016] [Indexed: 11/09/2022]
Affiliation(s)
- Brian Neelon
- Department of Public Health Sciences; Medical University of South Carolina; Charleston SC 29425 U.S.A
| | - A. James O'Malley
- Department of Biomedical Data Science and The Dartmouth Institute for Health Policy and Clinical Practice; Lebanon NH 03766 U.S.A
| | - Valerie A. Smith
- Center for Health Services Research in Primary Care, Durham VA Medical Center; Durham NC 27705 U.S.A
- Division of General Internal Medicine; Department of Medicine, Duke University; Durham NC 27710 U.S.A
| |
Collapse
|
16
|
Dreassi E, Rocco E. A Bayesian semiparametric model for non negative semicontinuous data. COMMUN STAT-THEOR M 2016. [DOI: 10.1080/03610926.2015.1096389] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
Arcuti S, Pollice A, Ribecco N, D'Onghia G. Bayesian spatiotemporal analysis of zero-inflated biological population density data by a delta-normal spatiotemporal additive model. Biom J 2015; 58:372-86. [DOI: 10.1002/bimj.201400123] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Revised: 05/28/2015] [Accepted: 06/02/2015] [Indexed: 11/06/2022]
Affiliation(s)
- Simona Arcuti
- Dipartimento di Scienze economiche e metodi matematici; Università degli studi di Bari Aldo Moro; Largo Abbazia Santa scolastica 53 70124 Bari Italy
| | - Alessio Pollice
- Dipartimento di Scienze economiche e metodi matematici; Università degli studi di Bari Aldo Moro; Largo Abbazia Santa scolastica 53 70124 Bari Italy
| | - Nunziata Ribecco
- Dipartimento di Scienze economiche e metodi matematici; Università degli studi di Bari Aldo Moro; Largo Abbazia Santa scolastica 53 70124 Bari Italy
| | - Gianfranco D'Onghia
- Dipartimento di Biologia; Università degli studi di Bari Aldo Moro; Via E. Orabona 4 70125 Bari Italy
| |
Collapse
|
18
|
Xing D, Huang Y, Chen H, Zhu Y, Dagne GA, Baldwin J. Bayesian inference for two-part mixed-effects model using skew distributions, with application to longitudinal semicontinuous alcohol data. Stat Methods Med Res 2015; 26:1838-1853. [PMID: 26092477 DOI: 10.1177/0962280215590284] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Semicontinuous data featured with an excessive proportion of zeros and right-skewed continuous positive values arise frequently in practice. One example would be the substance abuse/dependence symptoms data for which a substantial proportion of subjects investigated may report zero. Two-part mixed-effects models have been developed to analyze repeated measures of semicontinuous data from longitudinal studies. In this paper, we propose a flexible two-part mixed-effects model with skew distributions for correlated semicontinuous alcohol data under the framework of a Bayesian approach. The proposed model specification consists of two mixed-effects models linked by the correlated random effects: (i) a model on the occurrence of positive values using a generalized logistic mixed-effects model (Part I); and (ii) a model on the intensity of positive values using a linear mixed-effects model where the model errors follow skew distributions including skew- t and skew-normal distributions (Part II). The proposed method is illustrated with an alcohol abuse/dependence symptoms data from a longitudinal observational study, and the analytic results are reported by comparing potential models under different random-effects structures. Simulation studies are conducted to assess the performance of the proposed models and method.
Collapse
Affiliation(s)
- Dongyuan Xing
- 1 Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, USA
| | - Yangxin Huang
- 1 Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, USA
| | - Henian Chen
- 1 Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, USA
| | - Yiliang Zhu
- 1 Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, USA
| | - Getachew A Dagne
- 1 Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, USA
| | - Julie Baldwin
- 2 Department of Community and Family Health, College of Public Health, University of South Florida, Tampa, USA
| |
Collapse
|
19
|
Parker AJ, Bandyopadhyay D, Slate EH. A spatial augmented beta regression model for periodontal proportion data. STAT MODEL 2014. [DOI: 10.1177/1471082x14535515] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Clinical dental research generates large amounts of data with a potentially complex correlation structure from measurements recorded at several sites throughout the mouth. Clinical attachment level (CAL) is one such measure popularly used to assess the periodontal disease (PD) status. We model the proportion of sites for each tooth-type (i.e., incisor, canine, pre-molar and molar) per subject that exhibit moderate to severe PD. Disease free and highly diseased tooth-sites cause these proportion responses to lie in the closed interval [0, 1]. In addition, PD may be spatially referenced, i.e., the disease status of a site is influenced by its neighbours. While beta regression can assess the covariate-response relationship for proportion data, its support in the interval (0, 1) impairs its ability to account for the observed proportions at zero and one. In contrast to ad hoc transformations that confine responses to (0, 1), we develop a framework that augments the beta density with non-zero masses at zero and one while also controlling for spatial referencing. Our approach is Bayesian and is computationally amenable to available software. A simulation study evaluates estimation of regression effects in scenarios of varying sample size, degree of spatial dependence and response transformations. Application to real PD data provide insights into assessing covariate effects on proportion responses.
Collapse
Affiliation(s)
- Anthony J Parker
- Department of Mathematics, College of Charleston, Charleston, SC 29424, USA
| | | | - Elizabeth H Slate
- Department of Statistics, Florida State University, Tallahassee, FL, 32306, USA
| |
Collapse
|
20
|
Galvis DM, Bandyopadhyay D, Lachos VH. Augmented mixed beta regression models for periodontal proportion data. Stat Med 2014; 33:3759-71. [PMID: 24764045 DOI: 10.1002/sim.6179] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 03/27/2014] [Accepted: 03/31/2014] [Indexed: 11/08/2022]
Abstract
Continuous (clustered) proportion data often arise in various domains of medicine and public health where the response variable of interest is a proportion (or percentage) quantifying disease status for the cluster units, ranging between zero and one. However, because of the presence of relatively disease-free as well as heavily diseased subjects in any study, the proportion values can lie in the interval [0,1]. While beta regression can be adapted to assess covariate effects in these situations, its versatility is often challenged because of the presence/excess of zeros and ones because the beta support lies in the interval (0,1). To circumvent this, we augment the probabilities of zero and one with the beta density, controlling for the clustering effect. Our approach is Bayesian with the ability to borrow information across various stages of the complex model hierarchy and produces a computationally convenient framework amenable to available freeware. The marginal likelihood is tractable and can be used to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. Both simulation studies and application to a real dataset from a clinical periodontology study quantify the gain in model fit and parameter estimation over other ad hoc alternatives and provide quantitative insight into assessing the true covariate effects on the proportion responses.
Collapse
Affiliation(s)
- Diana M Galvis
- Departamento de Estatística, IMECC-UNICAMP, Campinas, São Paulo, Brazil
| | | | | |
Collapse
|
21
|
Dreassi E, Petrucci A, Rocco E. Small area estimation for semicontinuous skewed spatial data: An application to the grape wine production in Tuscany. Biom J 2013; 56:141-56. [DOI: 10.1002/bimj.201200271] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 06/26/2013] [Accepted: 07/20/2013] [Indexed: 11/07/2022]
Affiliation(s)
- Emanuela Dreassi
- Dipartimento di Statistica, Informatica, Applicazioni “G. Parenti” (DiSIA); Università degli Studi di Firenze; Viale Morgagni 59 - I 50134 Florence Italy
| | - Alessandra Petrucci
- Dipartimento di Statistica, Informatica, Applicazioni “G. Parenti” (DiSIA); Università degli Studi di Firenze; Viale Morgagni 59 - I 50134 Florence Italy
| | - Emilia Rocco
- Dipartimento di Statistica, Informatica, Applicazioni “G. Parenti” (DiSIA); Università degli Studi di Firenze; Viale Morgagni 59 - I 50134 Florence Italy
| |
Collapse
|
22
|
Hatfield LA, Boye ME, Carlin BP. Joint modeling of multiple longitudinal patient-reported outcomes and survival. J Biopharm Stat 2012; 21:971-91. [PMID: 21830926 DOI: 10.1080/10543406.2011.590922] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Researchers often include patient-reported outcomes (PROs) in Phase III clinical trials to demonstrate the value of treatment from the patient's perspective. These data are collected as longitudinal repeated measures and are often censored by occurrence of a clinical event that defines a survival time. Hierarchical Bayesian models having latent individual-level trajectories provide a flexible approach to modeling such multiple outcome types simultaneously. We consider the case of many zeros in the longitudinal data motivating a mixture model, and demonstrate several approaches to modeling multiple longitudinal PROs with survival in a cancer clinical trial. These joint models may enhance Phase III analyses and better inform health care decision makers.
Collapse
Affiliation(s)
- Laura A Hatfield
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| | | | | |
Collapse
|
23
|
Chen J, Liu L, Johnson BA, O'Quigley J. Penalized likelihood estimation for semiparametric mixed models, with application to alcohol treatment research. Stat Med 2012; 32:335-46. [DOI: 10.1002/sim.5528] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Accepted: 05/11/2012] [Indexed: 11/11/2022]
Affiliation(s)
- Jinsong Chen
- Department of Preventive Medicine; Northwestern University; Chicago IL U.S.A
| | - Lei Liu
- Department of Preventive Medicine; Northwestern University; Chicago IL U.S.A
| | - Bankole A. Johnson
- Department of Psychiatry and Neurobehavioral Sciences; University of Virginia; Charlottesville VA U.S.A
| | - John O'Quigley
- Laboratoire de Statistique Théorique et Appliquée Université Pierre et Marie Curie; Paris VI France
| |
Collapse
|
24
|
Hatfield LA, Boye ME, Hackshaw MD, Carlin BP. Multilevel Bayesian Models for Survival Times and Longitudinal Patient-Reported Outcomes With Many Zeros. J Am Stat Assoc 2012. [DOI: 10.1080/01621459.2012.664517] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Laura A. Hatfield
- a Department of Health Care Policy , Harvard Medical School , Boston , MA , 02115
| | - Mark E. Boye
- b Global Health Outcomes, Eli Lilly and Company , Indianapolis , IN , 46285
| | - Michelle D. Hackshaw
- c Global Health Outcomes—Oncology, Eli Lilly and Company , Indianapolis , IN , 46285
- d Global Health Outcomes—Oncology, Merck & Co., Inc. , Whitehouse Station , NJ , 08889
| | - Bradley P. Carlin
- e Division of Biostatistics , School of Public Health, University of Minnesota , Minneapolis , MN , 55455
| |
Collapse
|
25
|
Mittlböck M, Edler L, LeBlanc M, Niland J, Zwinderman K. Second Issue for Computational Statistics for Clinical Research. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2012.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
26
|
Neelon B, O'Malley AJ, Normand SLT. A bayesian two-part latent class model for longitudinal medical expenditure data: assessing the impact of mental health and substance abuse parity. Biometrics 2011; 67:280-9. [PMID: 20528856 DOI: 10.1111/j.1541-0420.2010.01439.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In 2001, the U.S. Office of Personnel Management required all health plans participating in the Federal Employees Health Benefits Program to offer mental health and substance abuse benefits on par with general medical benefits. The initial evaluation found that, on average, parity did not result in either large spending increases or increased service use over the four-year observational period. However, some groups of enrollees may have benefited from parity more than others. To address this question, we propose a Bayesian two-part latent class model to characterize the effect of parity on mental health use and expenditures. Within each class, we fit a two-part random effects model to separately model the probability of mental health or substance abuse use and mean spending trajectories among those having used services. The regression coefficients and random effect covariances vary across classes, thus permitting class-varying correlation structures between the two components of the model. Our analysis identified three classes of subjects: a group of low spenders that tended to be male, had relatively rare use of services, and decreased their spending pattern over time; a group of moderate spenders, primarily female, that had an increase in both use and mean spending after the introduction of parity; and a group of high spenders that tended to have chronic service use and constant spending patterns. By examining the joint 95% highest probability density regions of expected changes in use and spending for each class, we confirmed that parity had an impact only on the moderate spender class.
Collapse
Affiliation(s)
- Brian Neelon
- Nicholas School of the Environment, Duke University, Durham, North Carolina 27708, USA.
| | | | | |
Collapse
|
27
|
White JA, Yang X, Todd PA, Lerche NW. Longitudinal patterns of viremia and oral shedding of rhesus rhadinovirus and retroperitoneal fibromatosis herpesviruses in age-structured captive breeding populations of rhesus Macaques (Macaca mulatta). Comp Med 2011; 61:60-70. [PMID: 21819683 PMCID: PMC3060420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Revised: 05/09/2010] [Accepted: 09/14/2010] [Indexed: 05/31/2023]
Abstract
Rhesus rhadinovirus (RRV) and retroperitoneal fibromatosis herpesvirus (RFHV), 2 closely related γ2 herpesviruses, are endemic in breeding populations of rhesus macaques at our institution. We previously reported significantly different prevalence levels, suggesting the transmission dynamics of RRV and RFHV differ with regard to viral shedding and infectivity. We designed a longitudinal study to further examine the previously observed differences between RRV and RFHV prevalence and the potential influence of age, season, and housing location on the same 90 rhesus macaques previously studied. Virus- and host-genome-specific real-time PCR assays were used to determine viral loads for both RRV and RFHV in blood and saliva samples collected at 6 time points over an 18-mo period. Proportions of positive animals and viral load in blood and saliva were compared between and within viruses by age group, location, and season by using 2-part longitudinal modeling with Bayesian inferences. Our results demonstrate that age and season are significant determinants, with age as the most significant factor analyzed, of viremia and oral shedding for both RRV and RFHV, and these pathogens exhibit distinctly different patterns of viremia and oral shedding over time within a single population.
Collapse
Affiliation(s)
- Jessica A White
- California National Primate Research Center, School of Medicine, University of California, Davis, California, Koelle Lab, University of Washington, Seattle, Washington, USA.
| | | | | | | |
Collapse
|
28
|
Yang Y, Simpson D. Unified Computational Methods for Regression Analysis of Zero-Inflated and Bound-Inflated Data. Comput Stat Data Anal 2010; 54:1525-1534. [PMID: 20228950 DOI: 10.1016/j.csda.2009.12.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Bounded data with excess observations at the boundary are common in many areas of application. Various individual cases of inflated mixture models have been studied in the literature for bound-inflated data, yet the computational methods have been developed separately for each type of model. In this article we use a common framework for computing these models, and expand the range of models for both discrete and semi-continuous data with point inflation at the lower boundary. The quasi-Newton and EM algorithms are adapted and compared for estimation of model parameters. The numerical Hessian and generalized Louis method are investigated as means for computing standard errors after optimization. Correlated data are included in this framework via generalized estimating equations. The estimation of parameters and effectiveness of standard errors are demonstrated through simulation and in the analysis of data from an ultrasound bioeffect study. The unified approach enables reliable computation for a wide class of inflated mixture models and comparison of competing models.
Collapse
Affiliation(s)
- Yan Yang
- Department of Mathematics and Statistics, Arizona State University, Wexler Hall, Tempe, AZ 85287, USA
| | | |
Collapse
|
29
|
|