1
|
Zhou T, Ji Y. Bayesian Methods for Information Borrowing in Basket Trials: An Overview. Cancers (Basel) 2024; 16:251. [PMID: 38254740 PMCID: PMC10813856 DOI: 10.3390/cancers16020251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/22/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open
Abstract
Basket trials allow simultaneous evaluation of a single therapy across multiple cancer types or subtypes of the same cancer. Since the same treatment is tested across all baskets, it may be desirable to borrow information across them to improve the statistical precision and power in estimating and detecting the treatment effects in different baskets. We review recent developments in Bayesian methods for the design and analysis of basket trials, focusing on the mechanism of information borrowing. We explain the common components of these methods, such as a prior model for the treatment effects that embodies an assumption of exchangeability. We also discuss the distinct features of these methods that lead to different degrees of borrowing. Through simulation studies, we demonstrate the impact of information borrowing on the operating characteristics of these methods and discuss its broader implications for drug development. Examples of basket trials are presented in both phase I and phase II settings.
Collapse
Affiliation(s)
- Tianjian Zhou
- Department of Statistics, Colorado State University, Fort Collins, CO 80523, USA
| | - Yuan Ji
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
2
|
Röver C, Ursino M, Friede T, Zohar S. A straightforward meta-analysis approach for oncology phase I dose-finding studies. Stat Med 2022; 41:3915-3940. [PMID: 35661205 DOI: 10.1002/sim.9484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 05/12/2022] [Accepted: 05/16/2022] [Indexed: 11/09/2022]
Abstract
Phase I early-phase clinical studies aim at investigating the safety and the underlying dose-toxicity relationship of a drug or combination. While little may still be known about the compound's properties, it is crucial to consider quantitative information available from any studies that may have been conducted previously on the same drug. A meta-analytic approach has the advantages of being able to properly account for between-study heterogeneity, and it may be readily extended to prediction or shrinkage applications. Here we propose a simple and robust two-stage approach for the estimation of maximum tolerated dose(s) utilizing penalized logistic regression and Bayesian random-effects meta-analysis methodology. Implementation is facilitated using standard R packages. The properties of the proposed methods are investigated in Monte Carlo simulations. The investigations are motivated and illustrated by two examples from oncology.
Collapse
Affiliation(s)
- Christian Röver
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Moreno Ursino
- Unit of Clinical Epidemiology, AP-HP, CHU Robert Debré, Université Paris Cité, Inserm CIC-EC 1426, Paris, France.,Inserm, Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, Paris, France.,HeKA, Inria Paris, Paris, France
| | - Tim Friede
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Sarah Zohar
- Inserm, Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, Paris, France.,HeKA, Inria Paris, Paris, France
| |
Collapse
|
3
|
Zyphur MJ, Hamaker EL, Tay L, Voelkle M, Preacher KJ, Zhang Z, Allison PD, Pierides DC, Koval P, Diener EF. From Data to Causes III: Bayesian Priors for General Cross-Lagged Panel Models (GCLM). Front Psychol 2021; 12:612251. [PMID: 33658961 PMCID: PMC7917264 DOI: 10.3389/fpsyg.2021.612251] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/11/2021] [Indexed: 11/15/2022] Open
Abstract
This article describes some potential uses of Bayesian estimation for time-series and panel data models by incorporating information from prior probabilities (i.e., priors) in addition to observed data. Drawing on econometrics and other literatures we illustrate the use of informative “shrinkage” or “small variance” priors (including so-called “Minnesota priors”) while extending prior work on the general cross-lagged panel model (GCLM). Using a panel dataset of national income and subjective well-being (SWB) we describe three key benefits of these priors. First, they shrink parameter estimates toward zero or toward each other for time-varying parameters, which lends additional support for an income → SWB effect that is not supported with maximum likelihood (ML). This is useful because, second, these priors increase model parsimony and the stability of estimates (keeping them within more reasonable bounds) and thus improve out-of-sample predictions and interpretability, which means estimated effect should also be more trustworthy than under ML. Third, these priors allow estimating otherwise under-identified models under ML, allowing higher-order lagged effects and time-varying parameters that are otherwise impossible to estimate using observed data alone. In conclusion we note some of the responsibilities that come with the use of priors which, departing from typical commentaries on their scientific applications, we describe as involving reflection on how best to apply modeling tools to address matters of worldly concern.
Collapse
Affiliation(s)
- Michael J Zyphur
- Department of Management and Marketing, The University of Melbourne, Parkville, VIC, Australia
| | - Ellen L Hamaker
- Department of Methodology and Statistics, Utrecht University, Utrecht, Netherlands
| | - Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, IN, United States
| | - Manuel Voelkle
- Department of Psychology, Humboldt University of Berlin, Berlin, Germany
| | - Kristopher J Preacher
- Department of Psychology and Human Development, Humboldt University of Berlin, Berlin, Germany
| | - Zhen Zhang
- Cox School of Business, Southern Methodist University, Dallas, TX, United States.,W.P. Carey School of Business, Arizona State University, Tempe, AZ, United States
| | - Paul D Allison
- Department of Sociology, University of Pennsylvania, Philadelphia, PA, United States
| | - Dean C Pierides
- Stirling Management School, University of Stirling, Stirling, United Kingdom
| | - Peter Koval
- Melbourne School of Psychological Sciences, The University of Melbourne, Parkville, VIC, Australia
| | - Edward F Diener
- Department of Psychology, The University of Utah, Salt Lake City, UT, United States.,Department of Psychology, University of Virginia, Charlottesville, VA, United States
| |
Collapse
|
4
|
Röver C, Friede T. Bounds for the weight of external data in shrinkage estimation. Biom J 2021; 63:1131-1143. [PMID: 33629749 DOI: 10.1002/bimj.202000227] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 01/11/2021] [Accepted: 01/22/2021] [Indexed: 11/05/2022]
Abstract
Shrinkage estimation in a meta-analysis framework may be used to facilitate dynamical borrowing of information. This framework might be used to analyze a new study in the light of previous data, which might differ in their design (e.g., a randomized controlled trial and a clinical registry). We show how the common study weights arise in effect and shrinkage estimation, and how these may be generalized to the case of Bayesian meta-analysis. Next we develop simple ways to compute bounds on the weights, so that the contribution of the external evidence may be assessed a priori. These considerations are illustrated and discussed using numerical examples, including applications in the treatment of Creutzfeldt-Jakob disease and in fetal monitoring to prevent the occurrence of metabolic acidosis. The target study's contribution to the resulting estimate is shown to be bounded below. Therefore, concerns of evidence being easily overwhelmed by external data are largely unwarranted.
Collapse
Affiliation(s)
- Christian Röver
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Tim Friede
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
5
|
MacNab YC. Bayesian estimation of multivariate Gaussian Markov random fields with constraint. Stat Med 2020; 39:4767-4788. [PMID: 32935375 DOI: 10.1002/sim.8752] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 06/28/2020] [Accepted: 08/08/2020] [Indexed: 11/10/2022]
Abstract
This article concerns with conditionally formulated multivariate Gaussian Markov random fields (MGMRF) for modeling multivariate local dependencies with unknown dependence parameters subject to positivity constraint. In the context of Bayesian hierarchical modeling of lattice data in general and Bayesian disease mapping in particular, analytic and simulation studies provide new insights into various approaches to posterior estimation of dependence parameters under "hard" or "soft" positivity constraint, including the well-known strictly diagonal dominance criterion and options of hierarchical priors. Hierarchical centering is examined as a means to gain computational efficiency in Bayesian estimation of multivariate generalized linear mixed effects models in the presence of spatial confounding and weakly identified model parameters. Simulated data on irregular or regular lattice, and three datasets from the multivariate and spatiotemporal disease mapping literature, are used for illustration. The present investigation also sheds light on the use of deviance information criterion for model comparison, choice, and interpretation in the context of posterior risk predictions judged by borrowing-information and bias-precision tradeoff. The article concludes with a summary discussion and directions of future work. Potential applications of MGMRF in spatial information fusion and image analysis are briefly mentioned.
Collapse
Affiliation(s)
- Ying C MacNab
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
6
|
Wolter KM, Ganesh N, Copeland KR, Singleton JA, Khare M. Estimation tools for reducing the impact of sampling and nonresponse errors in dual-frame RDD telephone surveys. Stat Med 2019; 38:4718-4732. [PMID: 31418889 DOI: 10.1002/sim.8329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Revised: 04/09/2019] [Accepted: 06/26/2019] [Indexed: 11/07/2022]
Abstract
We discuss alternative estimators of the population total given a dual-frame random-digit-dial (RDD) telephone survey in which samples are selected from landline and cell phone sampling frames. The estimators are subject to sampling and nonsampling errors. To reduce sampling variability when an optimum balance of landline and cell phone samples is not feasible, we develop an application of shrinkage estimation. We demonstrate the implications for survey weighting of a differential nonresponse mechanism by telephone status. We illustrate these ideas using data from the National Immunization Survey-Child, a large dual-frame RDD telephone survey sponsored by the Centers for Disease Control and Prevention and conducted to measure the vaccination status of American children aged 19 to 35 months.
Collapse
Affiliation(s)
- Kirk M Wolter
- NORC at the University of Chicago, Chicago, Illinois
| | - N Ganesh
- NORC at the University of Chicago, Chicago, Illinois
| | | | - James A Singleton
- National Center for Immunization and Respiratory Diseases, Atlanta, Georgia
| | - Meena Khare
- National Center for Health Statistics, Hyattsville, Maryland
| |
Collapse
|
7
|
Glimm E. Adjusting for selection bias in assessing treatment effect estimates from multiple subgroups. Biom J 2018; 61:216-229. [PMID: 30474240 DOI: 10.1002/bimj.201800097] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 08/21/2018] [Accepted: 10/11/2018] [Indexed: 11/06/2022]
Abstract
This paper discusses a number of methods for adjusting treatment effect estimates in clinical trials where differential effects in several subpopulations are suspected. In such situations, the estimates from the most extreme subpopulation are often overinterpreted. The paper focusses on the construction of simultaneous confidence intervals intended to provide a more realistic assessment regarding the uncertainty around these extreme results. The methods from simultaneous inference are compared with shrinkage estimates arising from Bayesian hierarchical models by discussing salient features of both approaches in a typical application.
Collapse
Affiliation(s)
- Ekkehard Glimm
- Novartis Pharma AG, Novartis Campus, Basel, Switzerland.,Otto-von-Guericke University, Institute of Biometry and Medical Informatics, Magdeburg, Germany
| |
Collapse
|
8
|
Hu Z, Dong K, Dai W, Tong T. A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix. Int J Biostat 2017; 13:/j/ijb.ahead-of-print/ijb-2017-0013/ijb-2017-0013.xml. [PMID: 28953454 DOI: 10.1515/ijb-2017-0013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 08/16/2017] [Indexed: 11/15/2022]
Abstract
The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.
Collapse
|
9
|
Chan KKW, Xie F, Willan AR, Pullenayegum EM. Conducting EQ-5D Valuation Studies in Resource-Constrained Countries: The Potential Use of Shrinkage Estimators to Reduce Sample Size. Med Decis Making 2017; 38:26-33. [PMID: 28823185 DOI: 10.1177/0272989x17725748] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND Resource-constrained countries have difficulty conducting large EQ-5D valuation studies, which limits their ability to conduct cost-utility analyses using a value set specific to their own population. When estimates of similar but related parameters are available, shrinkage estimators reduce uncertainty and yield estimators with smaller mean square error (MSE). We hypothesized that health utilities based on shrinkage estimators can reduce MSE and mean absolute error (MAE) when compared to country-specific health utilities. METHODS We conducted a simulation study (1,000 iterations) based on the observed means and standard deviations (or standard errors) of the EQ-5D-3L valuation studies from 14 counties. In each iteration, the simulated data were fitted with the model based on the country-specific functional form of the scoring algorithm to create country-specific health utilities ("naïve" estimators). Shrinkage estimators were calculated based on the empirical Bayes estimation methods. The performance of shrinkage estimators was compared with those of the naïve estimators over a range of different sample sizes based on MSE, MAE, mean bias, standard errors and the width of confidence intervals. RESULTS The MSE of the shrinkage estimators was smaller than the MSE of the naïve estimators on average, as theoretically predicted. Importantly, the MAE of the shrinkage estimators was also smaller than the MAE of the naïve estimators on average. In addition, the reduction in MSE with the use of shrinkage estimators did not substantially increase bias. The degree of reduction in uncertainty by shrinkage estimators is most apparent in valuation studies with small sample size. CONCLUSION Health utilities derived from shrinkage estimation allow valuation studies with small sample size to "borrow strength" from other valuation studies to reduce uncertainty.
Collapse
Affiliation(s)
- Kelvin K W Chan
- Division of Medical Oncology and Hematology, Sunnybrook Odette Cancer Centre, Toronto, ON, Canada (KKC).,Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, ON, Canada (KKC, EMP).,Canadian Centre for Applied Research in Cancer Control (ARCC), Toronto, ON, Canada (KKC)
| | - Feng Xie
- Department of Clinical Epidemiology & Biostatistics, McMaster University, ON, Canada (FX)
| | - Andrew R Willan
- Child Health Evaluative Sciences, Hospital for Sick Children, Toronto, ON, Canada (ARW, EMP)
| | - Eleanor M Pullenayegum
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, ON, Canada (KKC, EMP).,Child Health Evaluative Sciences, Hospital for Sick Children, Toronto, ON, Canada (ARW, EMP)
| |
Collapse
|
10
|
Tao Y, Sánchez BN, Mukherjee B. Latent variable models for gene-environment interactions in longitudinal studies with multiple correlated exposures. Stat Med 2015; 34:1227-41. [PMID: 25545894 PMCID: PMC4355187 DOI: 10.1002/sim.6401] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Revised: 12/05/2014] [Accepted: 12/08/2014] [Indexed: 12/18/2022]
Abstract
Many existing cohort studies designed to investigate health effects of environmental exposures also collect data on genetic markers. The Early Life Exposures in Mexico to Environmental Toxicants project, for instance, has been genotyping single nucleotide polymorphisms on candidate genes involved in mental and nutrient metabolism and also in potentially shared metabolic pathways with the environmental exposures. Given the longitudinal nature of these cohort studies, rich exposure and outcome data are available to address novel questions regarding gene-environment interaction (G × E). Latent variable (LV) models have been effectively used for dimension reduction, helping with multiple testing and multicollinearity issues in the presence of correlated multivariate exposures and outcomes. In this paper, we first propose a modeling strategy, based on LV models, to examine the association between repeated outcome measures (e.g., child weight) and a set of correlated exposure biomarkers (e.g., prenatal lead exposure). We then construct novel tests for G × E effects within the LV framework to examine effect modification of outcome-exposure association by genetic factors (e.g., the hemochromatosis gene). We consider two scenarios: one allowing dependence of the LV models on genes and the other assuming independence between the LV models and genes. We combine the two sets of estimates by shrinkage estimation to trade off bias and efficiency in a data-adaptive way. Using simulations, we evaluate the properties of the shrinkage estimates, and in particular, we demonstrate the need for this data-adaptive shrinkage given repeated outcome measures, exposure measures possibly repeated and time-varying gene-environment association.
Collapse
Affiliation(s)
- Yebin Tao
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, 48109, U.S.A
| | | | | |
Collapse
|
11
|
Berger S, Pérez-Rodríguez P, Veturi Y, Simianer H, de los Campos G. Effectiveness of shrinkage and variable selection methods for the prediction of complex human traits using data from distantly related individuals. Ann Hum Genet 2015; 79:122-35. [PMID: 25600682 PMCID: PMC4428155 DOI: 10.1111/ahg.12099] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 12/03/2014] [Indexed: 02/02/2023]
Abstract
Genome‐wide association studies (GWAS) have detected large numbers of variants associated with complex human traits and diseases. However, the proportion of variance explained by GWAS‐significant single nucleotide polymorphisms has been usually small. This brought interest in the use of whole‐genome regression (WGR) methods. However, there has been limited research on the factors that affect prediction accuracy (PA) of WGRs when applied to human data of distantly related individuals. Here, we examine, using real human genotypes and simulated phenotypes, how trait complexity, marker‐quantitative trait loci (QTL) linkage disequilibrium (LD), and the model used affect the performance of WGRs. Our results indicated that the estimated rate of missing heritability is dependent on the extent of marker‐QTL LD. However, this parameter was not greatly affected by trait complexity. Regarding PA our results indicated that: (a) under perfect marker‐QTL LD WGR can achieve moderately high prediction accuracy, and with simple genetic architectures variable selection methods outperform shrinkage procedures and (b) under imperfect marker‐QTL LD, variable selection methods can achieved reasonably good PA with simple or moderately complex genetic architectures; however, the PA of these methods deteriorated as trait complexity increases and with highly complex traits variable selection and shrinkage methods both performed poorly. This was confirmed with an analysis of human height.
Collapse
Affiliation(s)
- Swetlana Berger
- Animal Breeding and Genetics Group, Department of Animal Sciences, Georg-August-University Goettingen, Albrecht-Thaer-Weg 3, Goettingen, Germany
| | | | | | | | | |
Collapse
|
12
|
Li Z, Hallingbäck HR, Abrahamsson S, Fries A, Gull BA, Sillanpää MJ, García-Gil MR. Functional multi-locus QTL mapping of temporal trends in Scots pine wood traits. G3 (Bethesda) 2014; 4:2365-79. [PMID: 25305041 DOI: 10.1534/g3.114.014068] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Quantitative trait loci (QTL) mapping of wood properties in conifer species has focused on single time point measurements or on trait means based on heterogeneous wood samples (e.g., increment cores), thus ignoring systematic within-tree trends. In this study, functional QTL mapping was performed for a set of important wood properties in increment cores from a 17-yr-old Scots pine (Pinus sylvestris L.) full-sib family with the aim of detecting wood trait QTL for general intercepts (means) and for linear slopes by increasing cambial age. Two multi-locus functional QTL analysis approaches were proposed and their performances were compared on trait datasets comprising 2 to 9 time points, 91 to 455 individual tree measurements and genotype datasets of amplified length polymorphisms (AFLP), and single nucleotide polymorphism (SNP) markers. The first method was a multilevel LASSO analysis whereby trend parameter estimation and QTL mapping were conducted consecutively; the second method was our Bayesian linear mixed model whereby trends and underlying genetic effects were estimated simultaneously. We also compared several different hypothesis testing methods under either the LASSO or the Bayesian framework to perform QTL inference. In total, five and four significant QTL were observed for the intercepts and slopes, respectively, across wood traits such as earlywood percentage, wood density, radial fiberwidth, and spiral grain angle. Four of these QTL were represented by candidate gene SNPs, thus providing promising targets for future research in QTL mapping and molecular function. Bayesian and LASSO methods both detected similar sets of QTL given datasets that comprised large numbers of individuals.
Collapse
|
13
|
Armagan A, Dunson DB, Lee J. GENERALIZED DOUBLE PARETO SHRINKAGE. Stat Sin 2013; 23:119-143. [PMID: 24478567 PMCID: PMC3903426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We propose a generalized double Pareto prior for Bayesian shrinkage estimation and inferences in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, forming a bridge between the Laplace and Normal-Jeffreys' priors. While it has a spike at zero like the Laplace density, it also has a Student's t-like tail behavior. Bayesian computation is straightforward via a simple Gibbs sampling algorithm. We investigate the properties of the maximum a posteriori estimator, as sparse estimation plays an important role in many problems, reveal connections with some well-established regularization procedures, and show some asymptotic results. The performance of the prior is tested through simulations and an application.
Collapse
Affiliation(s)
| | - David B Dunson
- Department of Statistical Science, Duke University, Durham, NC 27708, USA,
| | - Jaeyong Lee
- Department of Statistics, Seoul National University, Seoul, 151-747, Korea,
| |
Collapse
|
14
|
Simmonds MC, Higgins JP, Stewart LA. Random-effects meta-analysis of time-to-event data using the expectation-maximisation algorithm and shrinkage estimators. Res Synth Methods 2012; 4:144-55. [PMID: 26053654 DOI: 10.1002/jrsm.1067] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2011] [Revised: 09/17/2012] [Accepted: 10/08/2012] [Indexed: 11/06/2022]
Abstract
Meta-analysis of time-to-event data has proved difficult in the past because consistent summary statistics often cannot be extracted from published results. The use of individual patient data allows for the re-analysis of each study in a consistent fashion and thus makes meta-analysis of time-to-event data feasible. Time-to-event data can be analysed using proportional hazards models, but incorporating random effects into these models is not straightforward in standard software. This paper fits random-effects proportional hazards models by treating the random effects as missing data and applying the expectation-maximisation algorithm. This approach has been used before by using Markov chain Monte Carlo methods to perform the expectation step of the algorithm. In this paper, the expectation step is simplified, without sacrificing accuracy, by approximating the expected values of the random effects using simple shrinkage estimators. This provides a robust method for fitting random-effects models that can be implemented in standard statistical packages. Copyright © 2012 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Mark C Simmonds
- Centre for Reviews and Dissemination, University of York, York, UK
| | - Julian Pt Higgins
- Centre for Reviews and Dissemination, University of York, York, UK.,MRC Biostatistics Unit, Cambridge, UK
| | - Lesley A Stewart
- Centre for Reviews and Dissemination, University of York, York, UK
| |
Collapse
|
15
|
Endelman JB, Jannink JL. Shrinkage estimation of the realized relationship matrix. G3 (Bethesda) 2012; 2:1405-13. [PMID: 23173092 DOI: 10.1534/g3.112.004259] [Citation(s) in RCA: 254] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Accepted: 09/10/2012] [Indexed: 11/29/2022]
Abstract
The additive relationship matrix plays an important role in mixed model prediction of breeding values. For genotype matrix X (loci in columns), the product XX′ is widely used as a realized relationship matrix, but the scaling of this matrix is ambiguous. Our first objective was to derive a proper scaling such that the mean diagonal element equals 1+f, where f is the inbreeding coefficient of the current population. The result is a formula involving the covariance matrix for sampling genomic loci, which must be estimated with markers. Our second objective was to investigate whether shrinkage estimation of this covariance matrix can improve the accuracy of breeding value (GEBV) predictions with low-density markers. Using an analytical formula for shrinkage intensity that is optimal with respect to mean-squared error, simulations revealed that shrinkage can significantly increase GEBV accuracy in unstructured populations, but only for phenotyped lines; there was no benefit for unphenotyped lines. The accuracy gain from shrinkage increased with heritability, but at high heritability (> 0.6) this benefit was irrelevant because phenotypic accuracy was comparable. These trends were confirmed in a commercial pig population with progeny-test-estimated breeding values. For an anonymous trait where phenotypic accuracy was 0.58, shrinkage increased the average GEBV accuracy from 0.56 to 0.62 (SE < 0.00) when using random sets of 384 markers from a 60K array. We conclude that when moderate-accuracy phenotypes and low-density markers are available for the candidates of genomic selection, shrinkage estimation of the relationship matrix can improve genetic gain.
Collapse
|
16
|
Abstract
Many existing cohort studies initially designed to investigate disease risk as a function of environmental exposures have collected genomic data in recent years with the objective of testing for gene-environment interaction (G × E) effects. In environmental epidemiology, interest in G × E arises primarily after a significant effect of the environmental exposure has been documented. Cohort studies often collect rich exposure data; as a result, assessing G × E effects in the presence of multiple exposure markers further increases the burden of multiple testing, an issue already present in both genetic and environment health studies. Latent variable (LV) models have been used in environmental epidemiology to reduce dimensionality of the exposure data, gain power by reducing multiplicity issues via condensing exposure data, and avoid collinearity problems due to presence of multiple correlated exposures. We extend the LV framework to characterize gene-environment interaction in presence of multiple correlated exposures and genotype categories. Further, similar to what has been done in case-control G × E studies, we use the assumption of gene-environment (G-E) independence to boost the power of tests for interaction. The consequences of making this assumption, or the issue of how to explicitly model G-E association has not been previously investigated in LV models. We postulate a hierarchy of assumptions about the LV model regarding the different forms of G-E dependence and show that making such assumptions may influence inferential results on the G, E, and G × E parameters. We implement a class of shrinkage estimators to data adaptively trade-off between the most restrictive to most flexible form of G-E dependence assumption and note that such class of compromise estimators can serve as a benchmark of model adequacy in LV models. We demonstrate the methods with an example from the Early Life Exposures in Mexico City to Neuro-Toxicants Study of lead exposure, iron metabolism genes, and birth weight.
Collapse
Affiliation(s)
- Brisa N Sánchez
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA.
| | | | | |
Collapse
|