1
|
Anyaso-Samuel S, Datta S. Testing for marginal covariate effect when the subgroup size induced by the covariate is informative. Stat Methods Med Res 2024:9622802241254196. [PMID: 38767219 DOI: 10.1177/09622802241254196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
In many cluster-correlated data analyses, informative cluster size poses a challenge that can potentially introduce bias in statistical analyses. Different methodologies have been introduced in statistical literature to address this bias. In this study, we consider a complex form of informativeness where the number of observations corresponding to latent levels of a unit-level continuous covariate within a cluster is associated with the response variable. This type of informativeness has not been explored in prior research. We present a novel test statistic designed to evaluate the effect of the continuous covariate while accounting for the presence of informativeness. The covariate induces a continuum of latent subgroups within the clusters, and our test statistic is formulated by aggregating values from an established statistic that accounts for informative subgroup sizes when comparing group-specific marginal distributions. Through carefully designed simulations, we compare our test with four traditional methods commonly employed in the analysis of cluster-correlated data. Only our test maintains the size across all data-generating scenarios with informativeness. We illustrate the proposed method to test for marginal associations in periodontal data with this distinctive form of informativeness.
Collapse
Affiliation(s)
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
2
|
Bible J, St Ville M, Albert PS, Liu D. Accounting for informative observation process in transition models of binary longitudinal outcome: Application to medical record data. Stat Methods Med Res 2024; 33:243-255. [PMID: 38303569 DOI: 10.1177/09622802231225527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
When extracting medical record data to form a retrospective cohort, investigators typically focus on a pre-specified study window, and select subjects who had hospital visits during that study window. However, such data extraction may suffer from an informative observation process, since sicker patients may have hospital visits more frequently. For example, Consecutive Pregnancy Study is a retrospective cohort study of women with multiple pregnancies in 23 Utah hospitals from 2003 to 2010, where the interest is to understand the risk factors of recurrent pregnancy outcomes, such as preterm birth. The observation process is informative in the sense that, women with adverse pregnancy outcomes may be less likely/willing/able to endure subsequent pregnancies. We proposed a three-part joint model with shared random effects structure to address this analytic complication. Particularly, a first-order transition model is used to model the longitudinal binary outcome; a gamma regression model is assumed for the inter-pregnancy intervals; a continuation ratio model specifies the probability of continuing with more births in the future. We note that the latter two parts give rise to a parametric cure-rate survival model. The performance of the proposed method was examined in extensive simulation studies, with both correctly and mis-specified models. The analyses of Consecutive Pregnancy Study data further demonstrate the inadequacies of fitting the transition model alone ignoring the informative observation process.
Collapse
Affiliation(s)
- Joe Bible
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, USA
| | - Madeleine St Ville
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, USA
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Danping Liu
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| |
Collapse
|
3
|
Lee CY, Wong KY, Lam KF, Bandyopadhyay D. A semiparametric joint model for cluster size and subunit-specific interval-censored outcomes. Biometrics 2023; 79:2010-2022. [PMID: 36377514 PMCID: PMC10183480 DOI: 10.1111/biom.13795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 11/04/2022] [Indexed: 11/16/2022]
Abstract
Clustered data frequently arise in biomedical studies, where observations, or subunits, measured within a cluster are associated. The cluster size is said to be informative, if the outcome variable is associated with the number of subunits in a cluster. In most existing work, the informative cluster size issue is handled by marginal approaches based on within-cluster resampling, or cluster-weighted generalized estimating equations. Although these approaches yield consistent estimation of the marginal models, they do not allow estimation of within-cluster associations and are generally inefficient. In this paper, we propose a semiparametric joint model for clustered interval-censored event time data with informative cluster size. We use a random effect to account for the association among event times of the same cluster as well as the association between event times and the cluster size. For estimation, we propose a sieve maximum likelihood approach and devise a computationally-efficient expectation-maximization algorithm for implementation. The estimators are shown to be strongly consistent, with the Euclidean components being asymptotically normal and achieving semiparametric efficiency. Extensive simulation studies are conducted to evaluate the finite-sample performance, efficiency and robustness of the proposed method. We also illustrate our method via application to a motivating periodontal disease dataset.
Collapse
Affiliation(s)
- Chun Yin Lee
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - Kin Yau Wong
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - K. F. Lam
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | | |
Collapse
|
4
|
Shen B, Chen C, Chinchilli VM, Ghahramani N, Zhang L, Wang M. Semiparametric marginal methods for clustered data adjusting for informative cluster size with nonignorable zeros. Biom J 2022; 64:898-911. [PMID: 35257406 DOI: 10.1002/bimj.202100161] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 10/26/2021] [Accepted: 12/22/2021] [Indexed: 11/10/2022]
Abstract
Clustered or longitudinal data are commonly encountered in clinical trials and observational studies. This type of data could be collected through a real-time monitoring scheme associated with some specific event, such as disease recurrence, hospitalization, or emergency room visit. In these contexts, the cluster size could be informative because of its potential correlation with disease status, since more frequency of observations may indicate a worsening health condition. However, for some clusters/subjects, there are no measures or relevant medical records. Under such circumstances, these clusters/subjects may have a considerably lower risk of an event occurrence or may not be susceptible to such events at all, indicating a nonignorable zero cluster size. There is a substantial body of literature using observations from those clusters with a nonzero informative cluster size only, but few works discuss informative nonignorable zero-sized clusters. To utilize the information from both event-free and event-occurring participants, we propose a weighted within-cluster-resampling (WWCR) method and its asymptotically equivalent method, dual-weighted generalized estimating equations (WWGEE) by adopting the inverse probability weighting technique. The asymptotic properties are rigorously presented theoretically. Extensive simulations and an illustrative example of the Assessment, Serial Evaluation, and Subsequent Sequelae of Acute Kidney Injury (ASSESS-AKI) study are performed to analyze the finite-sample behavior of our methods and to show their advantageous performance compared to the existing approaches.
Collapse
Affiliation(s)
- Biyi Shen
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Chixiang Chen
- Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Vernon M Chinchilli
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | | | - Lijun Zhang
- Institute of Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Ming Wang
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| |
Collapse
|
5
|
Roman ZJ, Brandt H, Miller JM. Automated Bot Detection Using Bayesian Latent Class Models in Online Surveys. Front Psychol 2022; 13:789223. [PMID: 35572225 PMCID: PMC9093679 DOI: 10.3389/fpsyg.2022.789223] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 03/29/2022] [Indexed: 11/16/2022] Open
Abstract
Behavioral scientists have become increasingly reliant on online survey platforms such as Amazon's Mechanical Turk (Mturk). These platforms have many advantages, for example it provides ease of access to difficult to sample populations, a large pool of participants, and an easy to use implementation. A major drawback is the existence of bots that are used to complete online surveys for financial gain. These bots contaminate data and need to be identified in order to draw valid conclusions from data obtained with these platforms. In this article, we will provide a Bayesian latent class joint modeling approach that can be routinely applied to identify bots and simultaneously estimate a model of interest. This method can be used to separate the bots' response patterns from real human responses that were provided in line with the item content. The model has the advantage that it is very flexible and is based on plausible assumptions that are met in most empirical settings. We will provide a simulation study that investigates the performance of the model under several relevant scenarios including sample size, proportion of bots, and model complexity. We will show that ignoring bots will lead to severe parameter bias whereas the Bayesian latent class model results in unbiased estimates and thus controls this source of bias. We will illustrate the model and its capabilities with data from an empirical political ideation survey with known bots. We will discuss the implications of the findings with regard to future data collection via online platforms.
Collapse
Affiliation(s)
| | - Holger Brandt
- Department of Psychology, Faculty of Mathematics and Natural Sciences, University of Tübingen, Tübingen, Germany
| | | |
Collapse
|
6
|
A Bayesian joint model for continuous and zero-inflated count data in developmental toxicity studies. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2022. [DOI: 10.29220/csam.2022.29.2.239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Tang PC, Duggal NM, Haft JW, Romano MA, Bolling SF, Abou El Ela A, Wu X, Colvin MM, Aaronson KD, Pagani FD. Left Ventricular Assist Device Implantation in Patients with Preoperative Severe Mitral Regurgitation. ASAIO J 2021; 67:1139-1147. [PMID: 34570728 DOI: 10.1097/mat.0000000000001379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
We examined cardiac features associated with residual mitral regurgitation (MR) following continuous-flow left ventricular assist device (cfLVAD) implant. From 2003 to 2017, 134 patients with severe MR underwent cfVLAD implant without mitral valve (MV) intervention. Echocardiographic (echo) assessment occurred pre-cfLVAD, early post-cfLVAD, and at last available echo. Ventricular and atrial volumes were calculated from established formulas and normalized to be predicted. Cluster analysis based on preoperative normalized left ventricular and atrial volumes, and MV height identified grades 1, 2, and 3 with progressively larger cardiac chamber sizes. Median early echo follow-up was 0.92 (0.55, 1.45) months and the last follow-up was 15.12 (5.28, 38.28) months. Mitral regurgitation improved early after cfLVAD by 2.10 ± 1.16 grades (p < 0.01). Mitral regurgitation severity at the last echocardiogram positively correlated with the preoperative left ventricular volume (p = 0.014, R = 0.212), left atrial volume (p = 0.007, R = 0.233), MV anteroposterior height (p = 0.032, R = 0.185), and MV mediolateral diameter (p = 0.043, R = 0.175). Morphologically, smaller grade 1 hearts were correlated with MR resolution at the late follow-up (p = 0.023). Late right ventricular failure (RVF) at the last clinical follow-up was less in grade 1 (4/48 [8.3%]) compared with grades 2 and 3 (26/86 [30.2%]), p = 0.004). Grade 1 cardiac dimensions correlates with improvement in severe MR and had less late RVF.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Monica M Colvin
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Michigan
| | - Keith D Aaronson
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Michigan
| | | |
Collapse
|
8
|
Razie F, Bahrami Samani E, Ganjali M. Analysis of mixed longitudinal ( k, l)-Inflated power series, ordinal and continuous responses with sensitivity analysis to non-ignorable missing mechanism. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2019.1601215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Farzaneh Razie
- Department of Statistics Faculty of Mathematical Science, Shahid Beheshti University, Tehran, Iran
| | - Ehsan Bahrami Samani
- Department of Statistics Faculty of Mathematical Science, Shahid Beheshti University, Tehran, Iran
| | - Mojtaba Ganjali
- Department of Statistics Faculty of Mathematical Science, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
9
|
Pavlou M, Ambler G, Omar RZ. Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size. BMC Med Res Methodol 2021; 21:135. [PMID: 34218793 PMCID: PMC8254921 DOI: 10.1186/s12874-021-01321-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 05/19/2021] [Indexed: 12/04/2022] Open
Abstract
Background Clustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and cluster-specific inference and predictions, respectively. Methods Confounding by Cluster (CBC) and Informative cluster size (ICS) are two complications that may arise when modelling clustered data. CBC can arise when the distribution of a predictor variable (termed ‘exposure’), varies between clusters causing confounding of the exposure-outcome relationship. ICS means that the cluster size conditional on covariates is not independent of the outcome. In both situations, standard GEE and GLMM may provide biased or misleading inference, and modifications have been proposed. However, both CBC and ICS are routinely overlooked in the context of risk prediction, and their impact on the predictive ability of the models has been little explored. We study the effect of CBC and ICS on the predictive ability of risk models for binary outcomes when GEE and GLMM are used. We examine whether two simple approaches to handle CBC and ICS, which involve adjusting for the cluster mean of the exposure and the cluster size, respectively, can improve the accuracy of predictions. Results Both CBC and ICS can be viewed as violations of the assumptions in the standard GLMM; the random effects are correlated with exposure for CBC and cluster size for ICS. Based on these principles, we simulated data subject to CBC/ICS. The simulation studies suggested that the predictive ability of models derived from using standard GLMM and GEE ignoring CBC/ICS was affected. Marginal predictions were found to be mis-calibrated. Adjusting for the cluster-mean of the exposure or the cluster size improved calibration, discrimination and the overall predictive accuracy of marginal predictions, by explaining part of the between cluster variability. The presence of CBC/ICS did not affect the accuracy of conditional predictions. We illustrate these concepts using real data from a multicentre study with potential CBC. Conclusion Ignoring CBC and ICS when developing prediction models for clustered data can affect the accuracy of marginal predictions. Adjusting for the cluster mean of the exposure or the cluster size can improve the predictive accuracy of marginal predictions. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01321-x.
Collapse
|
10
|
Wilkinson J, Vail A, Roberts SA. Multivariate prediction of mixed, multilevel, sequential outcomes arising from in vitro fertilisation. Diagn Progn Res 2021; 5:2. [PMID: 33472692 PMCID: PMC7818923 DOI: 10.1186/s41512-020-00091-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 12/14/2020] [Indexed: 12/23/2022] Open
Abstract
In vitro fertilisation (IVF) comprises a sequence of interventions concerned with the creation and culture of embryos which are then transferred to the patient's uterus. While the clinically important endpoint is birth, the responses to each stage of treatment contain additional information about the reasons for success or failure. As such, the ability to predict not only the overall outcome of the cycle, but also the stage-specific responses, can be useful. This could be done by developing separate models for each response variable, but recent work has suggested that it may be advantageous to use a multivariate approach to model all outcomes simultaneously. Here, joint analysis of the sequential responses is complicated by mixed outcome types defined at two levels (patient and embryo). A further consideration is whether and how to incorporate information about the response at each stage in models for subsequent stages. We develop a case study using routinely collected data from a large reproductive medicine unit in order to investigate the feasibility and potential utility of multivariate prediction in IVF. We consider two possible scenarios. In the first, stage-specific responses are to be predicted prior to treatment commencement. In the second, responses are predicted dynamically, using the outcomes of previous stages as predictors. In both scenarios, we fail to observe benefits of joint modelling approaches compared to fitting separate regression models for each response variable.
Collapse
Affiliation(s)
- Jack Wilkinson
- Centre for Biostatistics, Division of Population Health, Health Services Research, and Primary Care, Manchester Academic Health Science Centre, University of Manchester, Manchester, M13 9PL, UK.
| | - Andy Vail
- Centre for Biostatistics, Division of Population Health, Health Services Research, and Primary Care, Manchester Academic Health Science Centre, University of Manchester, Manchester, M13 9PL, UK
| | - Stephen A Roberts
- Centre for Biostatistics, Division of Population Health, Health Services Research, and Primary Care, Manchester Academic Health Science Centre, University of Manchester, Manchester, M13 9PL, UK
| |
Collapse
|
11
|
Gueorguieva R, Buta E, Morean M, Krishnan-Sarin S. Two-part models for repeatedly measured ordinal data with "don't know" category. Stat Med 2020; 39:4574-4592. [PMID: 32909252 DOI: 10.1002/sim.8739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 08/03/2020] [Accepted: 08/08/2020] [Indexed: 11/09/2022]
Abstract
Ordinal data (eg, "low," "medium," "high"; graded response on a Likert scale) with an additional "don't know" category are frequently encountered in the medical, social, and behavioral science literature. The handling of a "don't know" option presents unique challenges as it often "destroys" the ordinal nature of the data. Commonly, nominal models are employed which ignore the partial ordering and have a complicated interpretation, especially in situations with repeatedly measured outcomes. We propose two-part models that easily accommodate longitudinal partially ordered (semiordinal) data. The most easily interpretable formulation consists of a random effect logistic submodel for "don't know" vs all the other categories combined, and a random effect ordinal submodel for the ordered categories. Correlated random effects account for statistical dependence within individual. An extension allowing for nonproportionality of odds for the predictor effects in the ordinal submodel is also considered. Maximum likelihood estimation is performed using adaptive Gaussian quadrature in SAS PROC NLMIXED. A simulation study is performed to evaluate the performance of the estimation algorithm in terms of bias and efficiency, and to compare the results of joint and separate models of the two parts, and of proportional and nonproportional model formulations. The methods are motivated and illustrated on a dataset from a study of adolescents' perceptions of nicotine strength of JUUL e-cigarettes. Using the proposed approach we show that adolescents perceive 5% nicotine content as relatively low, a misconception more pronounced among past month nonusers than among past month users of JUUL e-cigarettes.
Collapse
Affiliation(s)
- Ralitza Gueorguieva
- Department of Biostatistics, Yale Center for the Study of Tobacco Products (TCORS), Yale School of Public Health, New Haven, Connecticut, USA.,Department of Psychiatry, Yale Center for the Study of Tobacco Products (TCORS), Yale School of Medicine, New Haven, Connecticut, USA
| | - Eugenia Buta
- Department of Biostatistics, Yale Center for the Study of Tobacco Products (TCORS), Yale School of Public Health, New Haven, Connecticut, USA
| | - Meghan Morean
- Department of Psychiatry, Yale Center for the Study of Tobacco Products (TCORS), Yale School of Medicine, New Haven, Connecticut, USA
| | - Suchitra Krishnan-Sarin
- Department of Psychiatry, Yale Center for the Study of Tobacco Products (TCORS), Yale School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
12
|
Silk MJ, Harrison XA, Hodgson DJ. Perils and pitfalls of mixed-effects regression models in biology. PeerJ 2020. [DOI: 10.7717/peerj.9522] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Biological systems, at all scales of organisation from nucleic acids to ecosystems, are inherently complex and variable. Biologists therefore use statistical analyses to detect signal among this systemic noise. Statistical models infer trends, find functional relationships and detect differences that exist among groups or are caused by experimental manipulations. They also use statistical relationships to help predict uncertain futures. All branches of the biological sciences now embrace the possibilities of mixed-effects modelling and its flexible toolkit for partitioning noise and signal. The mixed-effects model is not, however, a panacea for poor experimental design, and should be used with caution when inferring or deducing the importance of both fixed and random effects. Here we describe a selection of the perils and pitfalls that are widespread in the biological literature, but can be avoided by careful reflection, modelling and model-checking. We focus on situations where incautious modelling risks exposure to these pitfalls and the drawing of incorrect conclusions. Our stance is that statements of significance, information content or credibility all have their place in biological research, as long as these statements are cautious and well-informed by checks on the validity of assumptions. Our intention is to reveal potential perils and pitfalls in mixed model estimation so that researchers can use these powerful approaches with greater awareness and confidence. Our examples are ecological, but translate easily to all branches of biology.
Collapse
Affiliation(s)
- Matthew J. Silk
- Centre for Ecology and Conservation, University of Exeter, Penryn, Cornwall, UK
- Environment and Sustainability Institute, University of Exeter, Penryn, Cornwall, UK
| | - Xavier A. Harrison
- Centre for Ecology and Conservation, University of Exeter, Penryn, Cornwall, UK
| | - David J. Hodgson
- Centre for Ecology and Conservation, University of Exeter, Penryn, Cornwall, UK
| |
Collapse
|
13
|
McGee G, Kioumourtzoglou M, Weisskopf MG, Haneuse S, Coull BA. On the interplay between exposure misclassification and informative cluster size. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Glen McGee
- Harvard T.H. Chan School of Public Health Boston USA
| | | | | | | | | |
Collapse
|
14
|
Affiliation(s)
- Yanqin Feng
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Shurong Lin
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Yang Li
- Department of Mathematics and Statistics, The University of North Carolina at Charlotte, Charlotte, North Carolina, United States
| |
Collapse
|
15
|
Lee D, Kim JK, Skinner CJ. Within-cluster resampling for multilevel models under informative cluster size. Biometrika 2019. [DOI: 10.1093/biomet/asz035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Summary
A within-cluster resampling method is proposed for fitting a multilevel model in the presence of informative cluster size. Our method is based on the idea of removing the information in the cluster sizes by drawing bootstrap samples which contain a fixed number of observations from each cluster. We then estimate the parameters by maximizing an average, over the bootstrap samples, of a suitable composite loglikelihood. The consistency of the proposed estimator is shown and does not require that the correct model for cluster size is specified. We give an estimator of the covariance matrix of the proposed estimator, and a test for the noninformativeness of the cluster sizes. A simulation study shows, as in Neuhaus & McCulloch (2011), that the standard maximum likelihood estimator exhibits little bias for some regression coefficients. However, for those parameters which exhibit nonnegligible bias, the proposed method is successful in correcting for this bias.
Collapse
Affiliation(s)
- D Lee
- Department of Statistics, Iowa State University, 2438 Osborn Drive, Ames, Iowa 50011, USA
| | - J K Kim
- Department of Statistics, Iowa State University, 2438 Osborn Drive, Ames, Iowa 50011, USA
| | - C J Skinner
- Department of Statistics, London School of Economics and Political Science, Houghton Street, London, WC2A 2AE, UK
| |
Collapse
|
16
|
Cluster analysis of preoperative echocardiographic findings and outcomes following left ventricular device implantation. J Thorac Cardiovasc Surg 2019; 157:1851-1860.e1. [DOI: 10.1016/j.jtcvs.2018.11.099] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 10/23/2018] [Accepted: 11/14/2018] [Indexed: 11/22/2022]
|
17
|
McGee G, Weisskopf MG, Kioumourtzoglou MA, Coull BA, Haneuse S. Informatively empty clusters with application to multigenerational studies. Biostatistics 2019; 21:775-789. [PMID: 30958890 PMCID: PMC7777575 DOI: 10.1093/biostatistics/kxz005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 02/25/2019] [Accepted: 02/26/2019] [Indexed: 11/13/2022] Open
Abstract
Exposures with multigenerational effects have profound implications for public health, affecting increasingly more people as the exposed population reproduces. Multigenerational studies, however, are susceptible to informative cluster size, occurring when the number of children to a mother (the cluster size) is related to their outcomes, given covariates. A natural question then arises: what if some women bear no children at all? The impact of these potentially informative empty clusters is currently unknown. This article first evaluates the performance of standard methods for informative cluster size when cluster size is permitted to be zero. We find that if the informative cluster size mechanism induces empty clusters, standard methods lead to biased estimates of target parameters. Joint models of outcome and size are capable of valid conditional inference as long as empty clusters are explicitly included in the analysis, but in practice empty clusters regularly go unacknowledged. In contrast, estimating equation approaches necessarily omit empty clusters and therefore yield biased estimates of marginal effects. To resolve this, we propose a joint marginalized approach that readily incorporates empty clusters and even in their absence permits more intuitive interpretations of population-averaged effects than do current methods. Competing methods are compared via simulation and in a study of the impact of in-utero exposure to diethylstilbestrol on the risk of attention-deficit/hyperactivity disorder (ADHD) among 106 198 children to 47 540 nurses from the Nurses Health Study.
Collapse
Affiliation(s)
- Glen McGee
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Marc G Weisskopf
- Departments of Environmental Health and Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Marianthi-Anna Kioumourtzoglou
- Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, 722 W. 168th St, New York, NY 10032, USA
| | - Brent A Coull
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Sebastien Haneuse
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| |
Collapse
|
18
|
Tang PC, Haft JW, Romano MA, Bitar A, Hasan R, Palardy M, Wu X, Aaronson KD, Pagani FD. Right ventricular function and residual mitral regurgitation after left ventricular assist device implantation determines the incidence of right heart failure. J Thorac Cardiovasc Surg 2019; 159:897-905.e4. [PMID: 31101350 DOI: 10.1016/j.jtcvs.2019.03.089] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Revised: 02/19/2019] [Accepted: 03/26/2019] [Indexed: 11/30/2022]
Abstract
BACKGROUND The effect of significant mitral regurgitation (MR) on outcomes after continuous flow left ventricular assist device (cfLVAD) implantation remains unclear. METHODS We performed a retrospective review of prospectively collected data from 159 patients with preoperative severe MR who underwent cfLVAD implantation (2003-2017). Two-step cluster analysis using the log-likelihood distance for post-cfLVAD implantation parameters, which included right ventricular (RV) dysfunction, MR severity, and tricuspid regurgitation (TR) severity. Post-cfLVAD implantation echocardiographic parameters were obtained within the first month. RESULTS Cluster analysis resulted in 3 groups. Group 1 (n = 67) had mild or less MR with moderate-severe RV dysfunction (RVD). Group 2 (n = 43) had moderate-severe MR with moderate-severe RVD. Group 3 (n = 49) had moderate MR with mild RVD. Group 2 had the largest proportion with Interagency Registry for Mechanically Assisted Circulatory Support score of 1 (30.2%) and 2 (41.9%). They were more likely to undergo temporary mechanical circulatory support (18.6%) and tricuspid valve procedure (62.8%). Group 2 had the highest rate of stroke (30.2%; P = .02), hemolysis (39.5%; P = .01), device thrombosis (30%; P = .01), and worst survival (46.5%; P = .01). Survival at 5 years for groups 1, 2, and 3 were 56.0%, 17.6%, and 55.8%. Regression analysis of the entire population showed that greater MR severity after cfLVAD was associated with RV failure (P < .05; odds ratio, 1.6) and RV assist device use (P = .09; odds ratio, 1.6). After excluding tricuspid valve repairs, MR severity had a positive correlation with TR severity (R = 0.33; P < .01). CONCLUSIONS After cfLVAD implantation, moderate-severe MR and RVD predicted RV failure. Patients with preoperative moderate-severe MR and TR coupled with moderate-severe RVD might benefit the most from mitral and tricuspid valve intervention.
Collapse
Affiliation(s)
- Paul C Tang
- Department of Cardiac Surgery, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich.
| | - Jonathan W Haft
- Department of Cardiac Surgery, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich
| | - Matthew A Romano
- Department of Cardiac Surgery, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich
| | - Abbas Bitar
- Division of Cardiovascular Medicine, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich
| | - Reema Hasan
- Division of Cardiovascular Medicine, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich
| | - Maryse Palardy
- Division of Cardiovascular Medicine, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich
| | - Xiaoting Wu
- Department of Cardiac Surgery, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich
| | - Keith D Aaronson
- Division of Cardiovascular Medicine, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich
| | - Francis D Pagani
- Department of Cardiac Surgery, University of Michigan Frankel Cardiovascular Center, Ann Arbor, Mich
| |
Collapse
|
19
|
Mitani AA, Kaye EK, Nelson KP. Marginal analysis of ordinal clustered longitudinal data with informative cluster size. Biometrics 2019; 75:938-949. [PMID: 30859544 DOI: 10.1111/biom.13050] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 02/26/2019] [Indexed: 11/30/2022]
Abstract
The issue of informative cluster size (ICS) often arises in the analysis of dental data. ICS describes a situation where the outcome of interest is related to cluster size. Much of the work on modeling marginal inference in longitudinal studies with potential ICS has focused on continuous outcomes. However, periodontal disease outcomes, including clinical attachment loss, are often assessed using ordinal scoring systems. In addition, participants may lose teeth over the course of the study due to advancing disease status. Here we develop longitudinal cluster-weighted generalized estimating equations (CWGEE) to model the association of ordinal clustered longitudinal outcomes with participant-level health-related covariates, including metabolic syndrome and smoking status, and potentially decreasing cluster size due to tooth-loss, by fitting a proportional odds logistic regression model. The within-teeth correlation coefficient over time is estimated using the two-stage quasi-least squares method. The motivation for our work stems from the Department of Veterans Affairs Dental Longitudinal Study in which participants regularly received general and oral health examinations. In an extensive simulation study, we compare results obtained from CWGEE with various working correlation structures to those obtained from conventional GEE which does not account for ICS. Our proposed method yields results with very low bias and excellent coverage probability in contrast to a conventional generalized estimating equations approach.
Collapse
Affiliation(s)
- Aya A Mitani
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, 02118
| | - Elizabeth K Kaye
- Department of Health Policy and Health Services Research, Boston University Henry M. Goldman School of Dental Medicine, Boston, Massachusetts, 02118
| | - Kerrie P Nelson
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, 02118
| |
Collapse
|
20
|
Yan G, Ma R, Tariqul Hasan M. A Joint Poisson State-Space Modelling Approach to Analysis of Binomial Series with Random Cluster Sizes. Int J Biostat 2019; 15:/j/ijb.ahead-of-print/ijb-2018-0090/ijb-2018-0090.xml. [PMID: 30897063 DOI: 10.1515/ijb-2018-0090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 03/01/2019] [Indexed: 11/15/2022]
Abstract
Serially correlation binomial data with random cluster sizes occur frequently in environmental and health studies. Such data series have traditionally been analyzed using binomial state-space or hidden Markov models without appropriately accounting for the randomness in the cluster sizes. To characterize correlation and extra-variation arising from the random cluster sizes properly, we introduce a joint Poisson state-space modelling approach to analysis of binomial series with random cluster sizes. This approach enables us to model the marginal counts and binomial proportions simultaneously. An optimal estimation of our model has been developed using the orthodox best linear unbiased predictors. This estimation method is computationally efficient and robust since it depends only on the first- and second- moment assumptions of unobserved random effects. Our proposed approach is illustrated with analysis of birth delivery data.
Collapse
Affiliation(s)
- Guohua Yan
- Department of Mathematics and Statistics, University of New Brunswick, Fredericton, Canada
| | - Renjun Ma
- Department of Mathematics and Statistics, University of New Brunswick, Fredericton, Canada
| | - M Tariqul Hasan
- Department of Mathematics and Statistics, University of New Brunswick, Fredericton, Canada
| |
Collapse
|
21
|
Pareek B, Ghosh P, Wilson HN, Macdonald EK, Baines P. Tracking the Impact of Media on Voter Choice in Real Time: A Bayesian Dynamic Joint Model. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1419134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Bhuvanesh Pareek
- Department of Operations Management and Quantitative Techniques, Indian Institute of Management, Indore, India
| | - Pulak Ghosh
- Department of Decision Sciences and Information Systems, Indian Institute of Management, Bangalore, India
| | - Hugh N. Wilson
- Cranfield School of Management, Cranfield University, Cranfield, Bedfordshire, UK
| | - Emma K. Macdonald
- Cranfield School of Management, Cranfield University, Cranfield, Bedfordshire, UK
| | - Paul Baines
- Cranfield School of Management, Cranfield University, Cranfield, Bedfordshire, UK
| |
Collapse
|
22
|
Fang D, Sun R, Wilson JR. Joint modeling of correlated binary outcomes: The case of contraceptive use and HIV knowledge in Bangladesh. PLoS One 2018; 13:e0190917. [PMID: 29351328 PMCID: PMC5774700 DOI: 10.1371/journal.pone.0190917] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 12/23/2017] [Indexed: 11/18/2022] Open
Abstract
Recent advances in statistical methods enable the study of correlation among outcomes through joint modeling, thereby addressing spillover effects. By joint modeling, we refer to simultaneously analyzing two or more different response variables emanating from the same individual. Using the 2011 Bangladesh Demographic and Health Survey, we jointly address spillover effects between contraceptive use (CUC) and knowledge of HIV and other sexually transmitted diseases. Jointly modeling these two outcomes is appropriate because certain types of contraceptive use contribute to the prevention of HIV and STDs and the knowledge and awareness of HIV and STDs typically lead to protection during sexual intercourse. In particular, we compared the differences as they pertained to the interpretive advantage of modeling the spillover effects of joint modeling HIV and CUC as opposed to addressing them separately. We also identified risk factors that determine contraceptive use and knowledge of HIV and STDs among women in Bangladesh. We found that by jointly modeling the correlation between HIV knowledge and contraceptive use, the importance of education decreased. The HIV prevention program had a spillover effect on CUC: what seemed to be impacted by education can be partially contributed to one's exposure to HIV knowledge. The joint model revealed a less significant impact of covariates as opposed to both separate models and standard models. Additionally, we found a spillover effect that would have otherwise been undiscovered if we did not jointly model. These findings further suggested that the simultaneous impact of correlated outcomes can be adequately addressed for the commonality between different responses and deflate, which is otherwise overestimated when examined separately.
Collapse
Affiliation(s)
- Di Fang
- Department of Agricultural Economics and Agribusiness, University of Arkansas, Fayetteville, AR, United States of America
| | - Renyuan Sun
- School of Mathematical and Statistical Science, Arizona State University, Tempe, AZ, United States of America
| | - Jeffrey R. Wilson
- Department of Economics, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
23
|
Nevalainen J, Oja H, Datta S. Tests for informative cluster size using a novel balanced bootstrap scheme. Stat Med 2017; 36:2630-2640. [PMID: 28324913 DOI: 10.1002/sim.7288] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Revised: 12/16/2016] [Accepted: 03/02/2017] [Indexed: 11/06/2022]
Abstract
Clustered data are often encountered in biomedical studies, and to date, a number of approaches have been proposed to analyze such data. However, the phenomenon of informative cluster size (ICS) is a challenging problem, and its presence has an impact on the choice of a correct analysis methodology. For example, Dutta and Datta (2015, Biometrics) presented a number of marginal distributions that could be tested. Depending on the nature and degree of informativeness of the cluster size, these marginal distributions may differ, as do the choices of the appropriate test. In particular, they applied their new test to a periodontal data set where the plausibility of the informativeness was mentioned, but no formal test for the same was conducted. We propose bootstrap tests for testing the presence of ICS. A balanced bootstrap method is developed to successfully estimate the null distribution by merging the re-sampled observations with closely matching counterparts. Relying on the assumption of exchangeability within clusters, the proposed procedure performs well in simulations even with a small number of clusters, at different distributions and against different alternative hypotheses, thus making it an omnibus test. We also explain how to extend the ICS test to a regression setting and thereby enhancing its practical utility. The methodologies are illustrated using the periodontal data set mentioned earlier. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
| | - Hannu Oja
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, U.S.A
| |
Collapse
|
24
|
Risk Assessment for Toxicity Experiments with Discrete and Continuous Outcomes: A Bayesian Nonparametric Approach. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2017. [DOI: 10.1007/s13253-017-0293-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
25
|
Chaurasia A, Liu D, Albert PS. Pattern-mixture models with incomplete informative cluster size: Application to a repeated pregnancy study. J R Stat Soc Ser C Appl Stat 2017. [PMID: 29531406 DOI: 10.1111/rssc.12226] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The incomplete informative cluster size problem is motivated by the NICHD Consecutive Pregnancies Study, aiming to study the relationship between pregnancy outcomes and parity. These pregnancy outcomes are potentially associated with the number of births over a woman's lifetime, resulting in an incomplete informative cluster size (censored at the end of the study window). We develop a pattern mixture model for informative cluster size by treating the lifetime number of births as a latent variable. We compare this approach with a simple alternative method that approximates the pattern mixture model. We show that the latent variable approach possesses good statistical properties for estimating both the mean trajectory of birthweight and the proportion of gestational hypertension with increasing parity.
Collapse
Affiliation(s)
- Ashok Chaurasia
- School of Public Health and Health Systems, University of Waterloo, 200 University Avenue West, LHN 1721. Waterloo N2L 3G1, Canada
| | - Danping Liu
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, 6710B Rockledge Dr., Room 3214, MSC 7004. Bethesda, MD 20817, USA
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Room 7E146. Rockville, MD 20892, USA
| |
Collapse
|
26
|
Liu S, Manatunga AK, Peng L, Marcus M. A joint modeling approach for multivariate survival data with random length. Biometrics 2016; 73:666-677. [PMID: 27704528 DOI: 10.1111/biom.12588] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 08/01/2016] [Accepted: 08/01/2016] [Indexed: 11/29/2022]
Abstract
In many biomedical studies that involve correlated data, an outcome is often repeatedly measured for each individual subject along with the number of these measurements, which is also treated as an observed outcome. This type of data has been referred as multivariate random length data by Barnhart and Sampson (1995). A common approach to handling such type of data is to jointly model the multiple measurements and the random length. In previous literature, a key assumption is the multivariate normality for the multiple measurements. Motivated by a reproductive study, we propose a new copula-based joint model which relaxes the normality assumption. Specifically, we adopt the Clayton-Oakes model for multiple measurements with flexible marginal distributions specified as semi-parametric transformation models. The random length is modeled via a generalized linear model. We develop an approximate EM algorithm to derive parameter estimators and standard errors of the estimators are obtained through bootstrapping procedures and the finite-sample performance of the proposed method is investigated using simulation studies. We apply our method to the Mount Sinai Study of Women Office Workers (MSSWOW), where women were prospectively followed for 1 year for studying fertility.
Collapse
Affiliation(s)
- Shuling Liu
- Department of Biotatistics and Bioinformatics, Emory University, Atlanta, Georgia, U.S.A
| | - Amita K Manatunga
- Department of Biotatistics and Bioinformatics, Emory University, Atlanta, Georgia, U.S.A
| | - Limin Peng
- Department of Biotatistics and Bioinformatics, Emory University, Atlanta, Georgia, U.S.A
| | - Michele Marcus
- Department of Epidemiology, Emory University, Atlanta, Georgia, U.S.A
| |
Collapse
|
27
|
Abstract
Latent trait models have long been used in the social science literature for studying variables that can only be measured indirectly through multiple items. However, such models are also very useful in accounting for correlation in multivariate and longitudinal data, particularly when outcomes have mixed measurement scales. Bayesian methods implemented with Markov chain Monte Carlo provide a flexible framework for routine fitting of a broad class of latent variable (LV) models, including very general structural equation models. However, in considering LV models, a number of challenging issues arise, including identifiability, confounding between the mean and variance, uncertainty in different aspects of the model, and difficulty in computation. Motivated by the problem of modelling multidimensional longitudinal data, this article reviews the recent literature, provides some recommendations and highlights areas in need of additional research, focusing on methods for model uncertainty.
Collapse
Affiliation(s)
- David B Dunson
- Biostatistics Branch, National Institute of Environmental Health Sciences, NC 27709, USA.
| |
Collapse
|
28
|
A note on misspecification in joint modeling of correlated data with informative cluster sizes. J Stat Plan Inference 2016. [DOI: 10.1016/j.jspi.2015.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
29
|
Antonelli J, Trippa L, Haneuse S. Mitigating Bias in Generalized Linear Mixed Models: The Case for Bayesian Nonparametrics. Stat Sci 2016; 31:80-95. [PMID: 28979066 PMCID: PMC5624537 DOI: 10.1214/15-sts533] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Generalized linear mixed models are a common statistical tool for the analysis of clustered or longitudinal data where correlation is accounted for through cluster-specific random effects. In practice, the distribution of the random effects is typically taken to be a Normal distribution, although if this does not hold then the model is misspecified and standard estimation/inference may be invalid. An alternative is to perform a so-called nonparametric Bayesian analyses in which one assigns a Dirichlet process (DP) prior to the unknown distribution of the random effects. In this paper we examine operating characteristics for estimation of fixed effects and random effects based on such an analysis under a range of "true" random effects distributions. As part of this we investigate various approaches for selection of the precision parameter of the DP prior. In addition, we illustrate the use of the methods with an analysis of post-operative complications among n = 18, 643 female Medicare beneficiaries who underwent a hysterectomy procedure at N = 503 hospitals in the US. Overall, we conclude that using the DP priori n modeling the random effect distribution results in large reductions of bias with little loss of efficiency. While no single choice for the precision parameter will be optimal in all settings, certain strategies such as importance sampling or empirical Bayes can be used to obtain reasonable results in a broad range of data scenarios.
Collapse
Affiliation(s)
- Joseph Antonelli
- Postdoctoral Fellow, Deparment of Biostatistics, Harvard Chan School of Public Health, 655Huntington Avenue, Boston, Massachusetts 02115, USA
| | - Lorenzo Trippa
- Assistant Professor, Department of Biostatistics, Dana-Farber Cancer Institute, Center for Life Science, 3 Blackfan Circle, Boston, Massachusetts 02115, USA
| | - Sebastien Haneuse
- Associate Professor, Department of Biostatistics, Harvard Chan School of Public Health, 655 Huntington Avenue, Boston, Massachusetts 02115, USA
| |
Collapse
|
30
|
Bible J, Beck JD, Datta S. Cluster adjusted regression for displaced subject data (CARDS): Marginal inference under potentially informative temporal cluster size profiles. Biometrics 2015; 72:441-51. [PMID: 26682911 DOI: 10.1111/biom.12456] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Revised: 10/01/2015] [Accepted: 10/01/2015] [Indexed: 11/30/2022]
Abstract
Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of inference drawn on observed data. Much work has been done in order to address the analysis of clustered data with informative cluster size; examples include Inverse Probability Weighting (IPW), Cluster Weighted Generalized Estimating Equations (CWGEE), and Doubly Weighted Generalized Estimating Equations (DWGEE). When cluster size changes with time, i.e., the data set possess temporally varying cluster sizes (TVCS), these methods may produce biased inference for the underlying marginal distribution of interest. We propose a new marginalization that may be appropriate for addressing clustered longitudinal data with TVCS. The principal motivation for our present work is to analyze the periodontal data collected by Beck et al. (1997, Journal of Periodontal Research 6, 497-505). Longitudinal periodontal data often exhibits both ICS and TVCS as the number of teeth possessed by participants at the onset of study is not constant and teeth as well as individuals may be displaced throughout the study.
Collapse
Affiliation(s)
- Joe Bible
- University of Louisville, Louisville, Kentucky, 40292, U.S.A
| | - James D Beck
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599, U.S.A
| | - Somnath Datta
- University of Florida, Gainesville, Florida, 32610, U.S.A
| |
Collapse
|
31
|
Fanshawe TR, Chapman CM, Crick T. Lymphangiogenesis and carcinoma in the uterine cervix: Joint and hierarchical models for random cluster sizes and continuous outcomes. Ann Appl Stat 2015. [DOI: 10.1214/15-aoas867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
32
|
Zhang B, Liu W, Zhang Z, Qu Y, Chen Z, Albert PS. Modeling of correlated data with informative cluster sizes: An evaluation of joint modeling and within-cluster resampling approaches. Stat Methods Med Res 2015; 26:1881-1895. [DOI: 10.1177/0962280215592268] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Joint modeling and within-cluster resampling are two approaches that are used for analyzing correlated data with informative cluster sizes. Motivated by a developmental toxicity study, we examined the performances and validity of these two approaches in testing covariate effects in generalized linear mixed-effects models. We show that the joint modeling approach is robust to the misspecification of cluster size models in terms of Type I and Type II errors when the corresponding covariates are not included in the random effects structure; otherwise, statistical tests may be affected. We also evaluate the performance of the within-cluster resampling procedure and thoroughly investigate the validity of it in modeling correlated data with informative cluster sizes. We show that within-cluster resampling is a valid alternative to joint modeling for cluster-specific covariates, but it is invalid for time-dependent covariates. The two methods are applied to a developmental toxicity study that investigated the effect of exposure to diethylene glycol dimethyl ether.
Collapse
Affiliation(s)
- Bo Zhang
- Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, USA
| | - Wei Liu
- Department of Mathematics, Harbin Institute of Technology, Harbin, P.R. China
| | - Zhiwei Zhang
- Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, USA
| | - Yanping Qu
- Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, USA
| | - Zhen Chen
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, USA
| | - Paul S Albert
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, USA
| |
Collapse
|
33
|
Fronczyk K, Kottas A. A Bayesian Nonparametric Modeling Framework for Developmental Toxicity Studies. J Am Stat Assoc 2014. [DOI: 10.1080/01621459.2013.830445] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
34
|
|
35
|
Seaman S, Pavlou M, Copas A. Review of methods for handling confounding by cluster and informative cluster size in clustered data. Stat Med 2014; 33:5371-87. [PMID: 25087978 PMCID: PMC4320764 DOI: 10.1002/sim.6277] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 07/08/2014] [Indexed: 01/23/2023]
Abstract
Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland.
Collapse
|
36
|
Seaman SR, Pavlou M, Copas AJ. Methods for observed-cluster inference when cluster size is informative: a review and clarifications. Biometrics 2014; 70:449-56. [PMID: 24479899 PMCID: PMC4312901 DOI: 10.1111/biom.12151] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Revised: 11/01/2013] [Accepted: 01/01/2014] [Indexed: 11/28/2022]
Abstract
Clustered data commonly arise in epidemiology. We assume each cluster member has an outcome Y and covariates X. When there are missing data in Y, the distribution of Y given X in all cluster members ("complete clusters") may be different from the distribution just in members with observed Y ("observed clusters"). Often the former is of interest, but when data are missing because in a fundamental sense Y does not exist (e.g., quality of life for a person who has died), the latter may be more meaningful (quality of life conditional on being alive). Weighted and doubly weighted generalized estimating equations and shared random-effects models have been proposed for observed-cluster inference when cluster size is informative, that is, the distribution of Y given X in observed clusters depends on observed cluster size. We show these methods can be seen as actually giving inference for complete clusters and may not also give observed-cluster inference. This is true even if observed clusters are complete in themselves rather than being the observed part of larger complete clusters: here methods may describe imaginary complete clusters rather than the observed clusters. We show under which conditions shared random-effects models proposed for observed-cluster inference do actually describe members with observed Y. A psoriatic arthritis dataset is used to illustrate the danger of misinterpreting estimates from shared random-effects models.
Collapse
|
37
|
Xu Y, Lee CF, Cheung YB. Analyzing Binary Outcome Data with Small Clusters: A Simulation Study. COMMUN STAT-SIMUL C 2014. [DOI: 10.1080/03610918.2012.744044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
38
|
Maity A, Williams PL, Ryan L, Missmer SA, Coull BA, Hauser R. Analysis of in vitro fertilization data with multiple outcomes using discrete time-to-event analysis. Stat Med 2013; 33:1738-49. [PMID: 24317880 DOI: 10.1002/sim.6050] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Revised: 04/26/2013] [Accepted: 11/03/2013] [Indexed: 11/06/2022]
Abstract
In vitro fertilization (IVF) is an increasingly common method of assisted reproductive technology. Because of the careful observation and follow-up required as part of the procedure, IVF studies provide an ideal opportunity to identify and assess clinical and demographic factors along with environmental exposures that may impact successful reproduction. A major challenge in analyzing data from IVF studies is handling the complexity and multiplicity of outcome, resulting from both multiple opportunities for pregnancy loss within a single IVF cycle in addition to multiple IVF cycles. To date, most evaluations of IVF studies do not make use of full data because of its complex structure. In this paper, we develop statistical methodology for analysis of IVF data with multiple cycles and possibly multiple failure types observed for each individual. We develop a general analysis framework based on a generalized linear modeling formulation that allows implementation of various types of models including shared frailty models, failure-specific frailty models, and transitional models, using standard software. We apply our methodology to data from an IVF study conducted at the Brigham and Women's Hospital, Massachusetts. We also summarize the performance of our proposed methods on the basis of a simulation study.
Collapse
Affiliation(s)
- Arnab Maity
- Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, NC, 27695, U.S.A
| | | | | | | | | | | |
Collapse
|
39
|
Chatterjee S. Development of uncertainty-based work injury model using Bayesian structural equation modelling. Int J Inj Contr Saf Promot 2013; 21:318-27. [PMID: 24111548 DOI: 10.1080/17457300.2013.825629] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
Collapse
Affiliation(s)
- Snehamoy Chatterjee
- a Department of Mining Engineering , National Institute of Technology Rourkela , Orissa - 769008 , India
| |
Collapse
|
40
|
Hwang BS, Pennell ML. Semiparametric Bayesian joint modeling of a binary and continuous outcome with applications in toxicological risk assessment. Stat Med 2013; 33:1162-75. [PMID: 24123309 DOI: 10.1002/sim.6007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 06/24/2013] [Accepted: 09/19/2013] [Indexed: 11/08/2022]
Abstract
Many dose-response studies collect data on correlated outcomes. For example, in developmental toxicity studies, uterine weight and presence of malformed pups are measured on the same dam. Joint modeling can result in more efficient inferences than independent models for each outcome. Most methods for joint modeling assume standard parametric response distributions. However, in toxicity studies, it is possible that response distributions vary in location and shape with dose, which may not be easily captured by standard models. To address this issue, we propose a semiparametric Bayesian joint model for a binary and continuous response. In our model, a kernel stick-breaking process prior is assigned to the distribution of a random effect shared across outcomes, which allows flexible changes in distribution shape with dose shared across outcomes. The model also includes outcome-specific fixed effects to allow different location effects. In simulation studies, we found that the proposed model provides accurate estimates of toxicological risk when the data do not satisfy assumptions of standard parametric models. We apply our method to data from a developmental toxicity study of ethylene glycol diethyl ether.
Collapse
Affiliation(s)
- Beom Seuk Hwang
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, U.S.A
| | | |
Collapse
|
41
|
Zhang X, Sun J. Semiparametric regression analysis of clustered interval-censored failure time data with informative cluster size. Int J Biostat 2013; 9:205-14. [PMID: 23940070 DOI: 10.1515/ijb-2012-0047] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Clustered interval-censored failure time data are commonly encountered in many medical settings. In such situations, one issue that often arises in practice is that the cluster size is related to the risk for the outcome of interest. It is well-known that ignoring the informativeness of the cluster size can result in biased parameter estimates. In this article, we consider regression analysis of clustered interval-censored data with informative cluster size with the focus on semiparametric methods. For the problem, two approaches are presented and investigated. One is a within-cluster resampling procedure and the other is a weighted estimating equation approach. Unlike previously published methods, the new approaches take into account cluster sizes and heterogeneous correlation structures without imposing strong parametric assumptions. A simulation experiment is carried out to evaluate the performance of the proposed approaches and indicates that they perform well for practical situations. The approaches are applied to a lymphatic filariasis study that motivated this study.
Collapse
|
42
|
Najita JS, Catalano PJ. On determining the BMD from multiple outcomes in developmental toxicity studies when one outcome is intentionally missing. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2013; 33:1500-1509. [PMID: 23231656 PMCID: PMC3683380 DOI: 10.1111/j.1539-6924.2012.01939.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Public health concerns over the occurrence of developmental abnormalities that can occur as a result of prenatal exposure to drugs, chemicals, and other environmental factors has led to a number of developmental toxicity studies and the use of the benchmark dose (BMD) for risk assessment. To characterize risk from multiple sources, more recent analytic methods involve a joint modeling approach, accounting for multiple dichotomous and continuous outcomes. For some continuous outcomes, evaluating all subjects may not be feasible, and only a subset may be evaluated due to limited resources. The subset can be selected according to a prespecified probability model and the unobserved data can be viewed as intentionally missing in the sense that subset selection results in missingness that is experimentally planned. We describe a subset selection model that allows for sampling pups with malformations and healthy pups at different rates, and includes the well-known simple random sample (SRS) as a special case. We were interested in understanding how sampling rates that are selected beforehand influence the precision of the BMD. Using simulations we show how improvements over the SRS can be obtained by oversampling malformations, and how some sampling rates can yield precision that is substantially worse than the SRS. We also illustrate the potential for cost saving with oversampling. Simulations are based on a joint mixed effects model, and to account for subset selection, use of case weights to obtain valid dose-response estimates.
Collapse
Affiliation(s)
- Julie S Najita
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | | |
Collapse
|
43
|
Nevalainen J, Datta S, Oja H. Inference on the marginal distribution of clustered data with informative cluster size. Stat Pap (Berl) 2013; 55:71-92. [PMID: 25878396 DOI: 10.1007/s00362-013-0504-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
In spite of recent contributions to the literature, informative cluster size settings are not well known and understood. In this paper, we give a formal definition of the problem and describe it from different viewpoints. Data generating mechanisms, parametric and nonparametric models are considered in light of examples. Our emphasis is on nonparametric and robust approaches to the inference on the marginal distribution. Descriptive statistics and parameters of interest are defined as functionals and they are accompanied with a generally applicable testing procedure. The theory is illustrated with an example on patients with incomplete spinal cord injuries.
Collapse
Affiliation(s)
- Jaakko Nevalainen
- Department of Mathematics and Statistics, University of Turku, FI-20014 Turku, Finland
| | | | - Hannu Oja
- University of Tampere, Tampere, Finland
| |
Collapse
|
44
|
Li X, Bandyopadhyay D, Lipsitz S, Sinha D. Likelihood methods for binary responses of present components in a cluster. Biometrics 2011; 67:629-35. [PMID: 20825395 PMCID: PMC3005556 DOI: 10.1111/j.1541-0420.2010.01483.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In some biomedical studies involving clustered binary responses (say, disease status), the cluster sizes can vary because some components of the cluster can be absent. When both the presence of a cluster component as well as the binary disease status of a present component are treated as responses of interest, we propose a novel two-stage random effects logistic regression framework. For the ease of interpretation of regression effects, both the marginal probability of presence/absence of a component as well as the conditional probability of disease status of a present component, preserve the approximate logistic regression forms. We present a maximum likelihood method of estimation implementable using standard statistical software. We compare our models and the physical interpretation of regression effects with competing methods from literature. We also present a simulation study to assess the robustness of our procedure to wrong specification of the random effects distribution and to compare finite-sample performances of estimates with existing methods. The methodology is illustrated via analyzing a study of the periodontal health status in a diabetic Gullah population.
Collapse
Affiliation(s)
- Xiaoyun Li
- Department of Statistics, Florida State University, Tallahassee, Florida 32306, USA.
| | | | | | | |
Collapse
|
45
|
Chen Z, Zhang B, Albert PS. A joint modeling approach to data with informative cluster size: robustness to the cluster size model. Stat Med 2011; 30:1825-36. [PMID: 21495060 DOI: 10.1002/sim.4239] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 02/18/2011] [Indexed: 11/06/2022]
Abstract
In many biomedical and epidemiological studies, data are often clustered due to longitudinal follow up or repeated sampling. While in some clustered data the cluster size is pre-determined, in others it may be correlated with the outcome of subunits, resulting in informative cluster size. When the cluster size is informative, standard statistical procedures that ignore cluster size may produce biased estimates. One attractive framework for modeling data with informative cluster size is the joint modeling approach in which a common set of random effects are shared by both the outcome and cluster size models. In addition to making distributional assumptions on the shared random effects, the joint modeling approach needs to specify the cluster size model. Questions arise as to whether the joint modeling approach is robust to misspecification of the cluster size model. In this paper, we studied both asymptotic and finite-sample characteristics of the maximum likelihood estimators in joint models when the cluster size model is misspecified. We found that using an incorrect distribution for the cluster size may induce small to moderate biases, while using a misspecified functional form for the shared random parameter in the cluster size model results in nearly unbiased estimation of outcome model parameters. We also found that there is little efficiency loss under this model misspecification. A developmental toxicity study was used to motivate the research and to demonstrate the findings.
Collapse
Affiliation(s)
- Zhen Chen
- Biostatistics and Bioinformatics Branch, Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, MD 20852, U.S.A.
| | | | | |
Collapse
|
46
|
Bandyopadhyay D, Reich BJ, Slate EH. A spatial beta-binomial model for clustered count data on dental caries. Stat Methods Med Res 2011; 20:85-102. [PMID: 20511359 PMCID: PMC2948643 DOI: 10.1177/0962280210372453] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
One of the most important indicators of dental caries prevalence is the total count of decayed, missing or filled surfaces in a tooth. These count data are often clustered in nature (several count responses clustered within a subject), over-dispersed as well as spatially referenced (a diseased tooth might be positively influencing the decay process of a set of neighbouring teeth). In this article, we develop a multivariate spatial betabinomial (BB) model for these data that accommodates both over-dispersion as well as latent spatial associations. Using a Bayesian paradigm, the re-parameterised marginal mean (as well as variance) under the BB framework are modelled using a regression on subject/tooth-specific co-variables and a conditionally autoregressive prior that models the latent spatial process. The necessity of exploiting spatial associations to model count data arising in dental caries research is demonstrated using a small simulation study. Real data confirms that our spatial BB model provides a superior estimation and model fit as compared to other sub-models that do not consider modelling spatial associations.
Collapse
Affiliation(s)
- Dipankar Bandyopadhyay
- Division of Biostatistics and Epidemiology, Medical University of South Carolina, Charleston, SC 29425, USA.
| | | | | |
Collapse
|
47
|
Neuhaus JM, McCulloch CE. Estimation of covariate effects in generalized linear mixed models with informative cluster sizes. Biometrika 2011; 98:147-162. [PMID: 23049125 DOI: 10.1093/biomet/asq066] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In standard regression analyses of clustered data, one typically assumes that the expected value of the response is independent of cluster size. However, this is often false. For example, in studies of surgical interventions, investigators have frequently found surgery volume and outcomes to be related to the skill level of the surgeons. This paper examines the effect of ignoring response-dependent, informative, cluster sizes on standard analytical methods such as mixed-effects models and conditional likelihood methods using analytic calculations, simulation studies and an example from a study of periodontal disease. We consider the case in which cluster sizes and responses share random effects which we assume to be independent of the covariates. Our focus is on maximum likelihood methods that ignore informative cluster sizes, and we show that they exhibit little bias in estimating covariate effects that are uncorrelated with the random effects associated with cluster sizes. However, estimation of covariate effects that are associated with the random effects can be biased. In particular, for models with random intercepts only, ignoring informative cluster sizes can yield biased estimators of the intercept but little bias in estimation of all covariate effects.
Collapse
Affiliation(s)
- John M Neuhaus
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94143-0560, U.S.A. , ,
| | | |
Collapse
|
48
|
|
49
|
Kim YJ. Regression analysis of clustered interval-censored data with informative cluster size. Stat Med 2010; 29:2956-62. [DOI: 10.1002/sim.4042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
50
|
Sedehi M, Mehrabi Y, Kazemnejad A, Joharimajd V, Hadaegh F. RETRACTED ARTICLE: Artificial neural network for prediction of mixed response variables: simulation and application. Neural Comput Appl 2010. [DOI: 10.1007/s00521-010-0436-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|