1
|
Bérubé S, Kobayashi T, Wesolowski A, Norris DE, Ruczinski I, Moss WJ, Louis TA. A Bayesian hierarchical model for signal extraction from protein microarrays. Stat Med 2023; 42:1445-1460. [PMID: 36872556 PMCID: PMC11806441 DOI: 10.1002/sim.9680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 11/09/2022] [Accepted: 01/30/2023] [Indexed: 03/07/2023]
Abstract
Protein microarrays are a promising technology that measure protein levels in serum or plasma samples. Due to their high technical variability and high variation in protein levels across serum samples in any population, directly answering biological questions of interest using protein microarray measurements is challenging. Analyzing preprocessed data and within-sample ranks of protein levels can mitigate the impact of between-sample variation. As for any analysis, ranks are sensitive to preprocessing, but loss function based ranks that accommodate major structural relations and components of uncertainty are very effective. Bayesian modeling with full posterior distributions for quantities of interest produce the most effective ranks. Such Bayesian models have been developed for other assays, for example, DNA microarrays, but modeling assumptions for these assays are not appropriate for protein microarrays. Consequently, we develop and evaluate a Bayesian model to extract the full posterior distribution of normalized protein levels and associated ranks for protein microarrays, and show that it fits well to data from two studies that use protein microarrays produced by different manufacturing processes. We validate the model via simulation and demonstrate the downstream impact of using estimates from this model to obtain optimal ranks.
Collapse
Affiliation(s)
- Sophie Bérubé
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Tamaki Kobayashi
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Amy Wesolowski
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Douglas E. Norris
- Department of Molecular Microbiology and Immunology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - William J. Moss
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
- Department of Molecular Microbiology and Immunology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Thomas A. Louis
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| |
Collapse
|
2
|
Haneuse S, Schrag D, Dominici F, Normand SL, Lee KH. MEASURING PERFORMANCE FOR END-OF-LIFE CARE. Ann Appl Stat 2022; 16:1586-1607. [PMID: 36483542 PMCID: PMC9728673 DOI: 10.1214/21-aoas1558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Although not without controversy, readmission is entrenched as a hospital quality metric with statistical analyses generally based on fitting a logistic-Normal generalized linear mixed model. Such analyses, however, ignore death as a competing risk, although doing so for clinical conditions with high mortality can have profound effects; a hospital's seemingly good performance for readmission may be an artifact of it having poor performance for mortality. in this paper we propose novel multivariate hospital-level performance measures for readmission and mortality that derive from framing the analysis as one of cluster-correlated semi-competing risks data. We also consider a number of profiling-related goals, including the identification of extreme performers and a bivariate classification of whether the hospital has higher-/lower-than-expected readmission and mortality rates via a Bayesian decision-theoretic approach that characterizes hospitals on the basis of minimizing the posterior expected loss for an appropriate loss function. in some settings, particularly if the number of hospitals is large, the computational burden may be prohibitive. To resolve this, we propose a series of analysis strategies that will be useful in practice. Throughout, the methods are illustrated with data from CMS on N = 17,685 patients diagnosed with pancreatic cancer between 2000-2012 at one of J = 264 hospitals in California.
Collapse
Affiliation(s)
- Sebastien Haneuse
- Department of Biostatistics, Harvard T.H. Chan School of Public Health,
| | - Deborah Schrag
- Division of Population Sciences, Dana-Farber Cancer Institute
| | | | | | - Kyu Ha Lee
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| |
Collapse
|
3
|
Al Mohamad D, van Zwet E, Solari A, Goeman J. Simultaneous confidence intervals for ranks using the partitioning principle. Electron J Stat 2021. [DOI: 10.1214/21-ejs1847] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Diaa Al Mohamad
- Leiden University Medical Center, Einthovenweg 20. 2333 ZC Leiden, The Nethlerlands
| | - Erik van Zwet
- Leiden University Medical Center, Einthovenweg 20. 2333 ZC Leiden, The Nethlerlands
| | - Aldo Solari
- University of Milano-Bicocca, 1 Piazza dell’Ateneo Nuovo. 20126 Milano, Italy
| | - Jelle Goeman
- Leiden University Medical Center, Einthovenweg 20. 2333 ZC Leiden, The Nethlerlands
| |
Collapse
|
4
|
Ferguson J, Chang J. An empirical Bayesian ranking method, with applications to high throughput biology. Bioinformatics 2020; 36:177-185. [PMID: 31197345 DOI: 10.1093/bioinformatics/btz471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Revised: 04/30/2019] [Accepted: 06/05/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In bioinformatics, genome-wide experiments look for important biological differences between two groups at a large number of locations in the genome. Often, the final analysis focuses on a P-value-based ranking of locations which might then be investigated further in follow-up experiments. However, this strategy may result in small effect sizes, with low P-values, being ranked more favorably than larger more scientifically important effects. Bayesian ranking techniques may offer a solution to this problem provided a good prior distribution for the collective distribution of effect sizes is available. RESULTS We develop an Empirical Bayes ranking algorithm, using the marginal distribution of the data over all locations to estimate an appropriate prior. In simulations and analysis using real datasets, we demonstrate favorable performance compared to ordering P-values and a number of other competing ranking methods. The algorithm is computationally efficient and can be used to rank the entirety of genomic locations or to rank a subset of locations, pre-selected via traditional FWER/FDR methods in a 2-stage analysis. AVAILABILITY AND IMPLEMENTATION An R-package, EBrank, implementing the ranking algorithm is available on CRAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John Ferguson
- Biostatistics Division, HRB Clinical Research Facility, National University of Ireland Galway, Galway, Ireland
| | - Joseph Chang
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| |
Collapse
|
5
|
Jewett PI, Zhu L, Huang B, Feuer EJ, Gangnon RE. Optimal Bayesian point estimates and credible intervals for ranking with application to county health indices. Stat Methods Med Res 2018; 28:2876-2891. [PMID: 30062909 DOI: 10.1177/0962280218790104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
It is fairly common to rank different geographic units, e.g. counties in the USA, based on health indices. In a typical application, point estimates of the health indices are obtained for each county, and the indices are then simply ranked as if they were known constants. Several authors have considered optimal rank estimators under squared error loss on the rank scale as a default method for general purpose ranking, e.g. situations where ranking units across the full spectrum of performance (low, medium, high) is important. While computationally convenient, squared error loss on the rank scale may not represent the true inferential goals of rank consumers. We construct alternative loss functions based on three components: (1) the inferential goal (rank position or pairwise comparisons), (2) the scale (original, log-transformed or rank) and (3) the (positional or pairwise) loss function (0/1, squared error or absolute error). We can obtain optimal ranks for loss functions based on rank positions and nearly optimal ranks for loss functions based on pairwise comparisons paired with highest posterior density (HPD) credible intervals. We compare inferences produced by the various ranking methods, both optimal and heuristic, using low birth weight data for counties in the Midwestern United States, from 2006 to 2012.
Collapse
Affiliation(s)
- Patricia I Jewett
- 1 Department of Population Health Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Li Zhu
- 2 Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Bin Huang
- 3 Department of Biostatistics, University of Kentucky, Lexington, KY, USA
| | - Eric J Feuer
- 2 Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ronald E Gangnon
- 1 Department of Population Health Sciences, University of Wisconsin-Madison, Madison, WI, USA
- 4 Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
6
|
de la Guardia FH, Hwang J, Adams JL, Paddock SM. Loss function-based evaluation of physician report cards. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2018. [DOI: 10.1007/s10742-018-0179-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
7
|
Adams JL, Paddock SM. Misclassification Risk of Tier-Based Physician Quality Performance Systems. Health Serv Res 2016; 52:1277-1296. [PMID: 27714791 DOI: 10.1111/1475-6773.12561] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
OBJECTIVE There is increasing interest in identifying high-quality physicians, such as whether physicians perform above or below a threshold level. To evaluate whether current methods accurately distinguish above- versus below-threshold physicians, we estimate misclassification rates for two-category identification systems. DATA SOURCES Claims data for Medicare fee-for-service beneficiaries residing in Florida or New York in 2010. STUDY DESIGN Estimate colorectal cancer, glaucoma, and diabetes quality scores for 23,085 physicians. Use a beta-binomial model to estimate physician score reliabilities. Compute the proportion of physicians whose performance tier would be misclassified under three scoring systems. PRINCIPAL FINDINGS In the three scoring systems, misclassification ranges were 8.6-25.7 percent, 6.4-22.8 percent, and 4.5-21.7%. True positive rate ranges were 72.9-97.0 percent, 83.4-100.0 percent, and 34.7-88.2 percent. True negative rate ranges were 68.5-91.6 percent, 10.5-92.4 percent, and 81.1-99.9 percent. Positive predictive value ranges were 70.5-91.6 percent, 77.0-97.3 percent, and 55.2-99.1 percent. CONCLUSIONS Current methods for profiling physicians on quality may produce misleading results, as the number of eligible events is typically small. Misclassification is a policy-relevant measure of the potential impact of tiering on providers, payers, and patients. Quantifying misclassification rates should inform the construction of high-performance networks and quality improvement initiatives.
Collapse
Affiliation(s)
- John L Adams
- Center for Effectiveness & Safety Research, Kaiser Permanente, Pasadena, CA
| | | |
Collapse
|
8
|
Henderson NC, Newton MA. Making the cut: improved ranking and selection for large-scale inference. J R Stat Soc Series B Stat Methodol 2016; 78:781-804. [PMID: 27570475 PMCID: PMC4996506 DOI: 10.1111/rssb.12131] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Identifying leading measurement units from a large collection is a common inference task in various domains of large-scale inference. Testing approaches, which measure evidence against a null hypothesis rather than effect magnitude, tend to overpopulate lists of leading units with those associated with low measurement error. By contrast, local maximum likelihood (ML) approaches tend to favor units with high measurement error. Available Bayesian and empirical Bayesian approaches rely on specialized loss functions that result in similar deficiencies. We describe and evaluate a generic empirical Bayesian ranking procedure that populates the list of top units in a way that maximizes the expected overlap between the true and reported top lists for all list sizes. The procedure relates unit-specific posterior upper tail probabilities with their empirical distribution to yield a ranking variable. It discounts high-variance units less than popular non-ML methods and thus achieves improved operating characteristics in the models considered.
Collapse
Affiliation(s)
| | - Michael A Newton
- Departments of Statistics and of Biostatistics and Medical Informatics, University of Wisconsin, Madison, USA
| |
Collapse
|
9
|
Sosunov EA, Egorova NN, Lin HM, McCardle K, Sharma V, Gelijns AC, Moskowitz AJ. The Impact of Hospital Size on CMS Hospital Profiling. Med Care 2016; 54:373-9. [DOI: 10.1097/mlr.0000000000000476] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
Hubbard RA, Benjamin-Johnson R, Onega T, Smith-Bindman R, Zhu W, Fenton JJ. Classification accuracy of claims-based methods for identifying providers failing to meet performance targets. Stat Med 2014; 34:93-105. [PMID: 25302935 DOI: 10.1002/sim.6318] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 08/15/2014] [Accepted: 09/14/2014] [Indexed: 11/09/2022]
Abstract
Quality assessment is critical for healthcare reform, but data sources are lacking for measurement of many important healthcare outcomes. With over 49 million people covered by Medicare as of 2010, Medicare claims data offer a potentially valuable source that could be used in targeted health care quality improvement efforts. However, little is known about the operating characteristics of provider profiling methods using claims-based outcome measures that may estimate provider performance with error. Motivated by the example of screening mammography performance, we compared approaches to identifying providers failing to meet guideline targets using Medicare claims data. We used data from the Breast Cancer Surveillance Consortium and linked Medicare claims to compare claims-based and clinical estimates of cancer detection rate. We then demonstrated the performance of claim-based estimates across a broad range of operating characteristics using simulation studies. We found that identification of poor performing providers was extremely sensitive to algorithm specificity, with no approach identifying more than 65% of poor performing providers when claims-based measures had specificity of 0.995 or less. We conclude that claims have the potential to contribute important information on healthcare outcomes to quality improvement efforts. However, to achieve this potential, development of highly accurate claims-based outcome measures should remain a priority.
Collapse
Affiliation(s)
- Rebecca A Hubbard
- Group Health Research Institute, Seattle, WA, U.S.A.; Department of Biostatistics, University of Washington, Seattle, WA, U.S.A
| | | | | | | | | | | |
Collapse
|
11
|
He Y, Selck F, Normand SLT. On the accuracy of classifying hospitals on their performance measures. Stat Med 2014; 33:1081-103. [PMID: 24122879 PMCID: PMC6400472 DOI: 10.1002/sim.6012] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Revised: 09/17/2013] [Accepted: 09/19/2013] [Indexed: 11/11/2022]
Abstract
The evaluation, comparison, and public report of health care provider performance is essential to improving the quality of health care. Hospitals, as one type of provider, are often classified into quality tiers (e.g., top or suboptimal) based on their performance data for various purposes. However, potential misclassification might lead to detrimental effects for both consumers and payers. Although such risk has been highlighted by applied health services researchers, a systematic investigation of statistical approaches has been lacking. We assess and compare the expected accuracy of several commonly used classification methods: unadjusted hospital-level averages, shrinkage estimators under a random-effects model accommodating between-hospital variation, and two others based on posterior probabilities. Assuming that performance data follow a classic one-way random-effects model with unequal sample size per hospital, we derive accuracy formulae for these classification approaches and gain insight into how the misclassification might be affected by various factors such as reliability of the data, hospital-level sample size distribution, and cutoff values between quality tiers. The case of binary performance data is also explored using Monte Carlo simulation strategies. We apply the methods to real data and discuss the practical implications.
Collapse
Affiliation(s)
- Yulei He
- Office of Research and Methodology, National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782, U.S.A
| | | | | |
Collapse
|
12
|
Yang X, Peng B, Chen R, Zhang Q, Zhu D, Zhang QJ, Xue F, Qi L. Statistical profiling methods with hierarchical logistic regression for healthcare providers with binary outcomes. J Appl Stat 2013. [DOI: 10.1080/02664763.2013.830086] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
13
|
Hierarchical Rank Aggregation with Applications to Nanotoxicology. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2013; 18:159-177. [PMID: 24839387 DOI: 10.1007/s13253-013-0129-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The development of high throughput screening (HTS) assays in the field of nanotoxicology provide new opportunities for the hazard assessment and ranking of engineered nanomaterials (ENMs). It is often necessary to rank lists of materials based on multiple risk assessment parameters, often aggregated across several measures of toxicity and possibly spanning an array of experimental platforms. Bayesian models coupled with the optimization of loss functions have been shown to provide an effective framework for conducting inference on ranks. In this article we present various loss-function-based ranking approaches for comparing ENM within experiments and toxicity parameters. Additionally, we propose a framework for the aggregation of ranks across different sources of evidence while allowing for differential weighting of this evidence based on its reliability and importance in risk ranking. We apply these methods to high throughput toxicity data on two human cell-lines, exposed to eight different nanomaterials, and measured in relation to four cytotoxicity outcomes. This article has supplementary material online.
Collapse
|
14
|
Noma H, Matsui S. Empirical Bayes ranking and selection methods via semiparametric hierarchical mixture models in microarray studies. Stat Med 2012; 32:1904-16. [PMID: 23281021 DOI: 10.1002/sim.5718] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2011] [Accepted: 12/06/2012] [Indexed: 11/07/2022]
Abstract
The main purpose of microarray studies is screening of differentially expressed genes as candidates for further investigation. Because of limited resources in this stage, prioritizing genes are relevant statistical tasks in microarray studies. For effective gene selections, parametric empirical Bayes methods for ranking and selection of genes with largest effect sizes have been proposed (Noma et al., 2010; Biostatistics 11: 281-289). The hierarchical mixture model incorporates the differential and non-differential components and allows information borrowing across differential genes with separation from nuisance, non-differential genes. In this article, we develop empirical Bayes ranking methods via a semiparametric hierarchical mixture model. A nonparametric prior distribution, rather than parametric prior distributions, for effect sizes is specified and estimated using the "smoothing by roughening" approach of Laird and Louis (1991; Computational statistics and data analysis 12: 27-37). We present applications to childhood and infant leukemia clinical studies with microarrays for exploring genes related to prognosis or disease progression.
Collapse
Affiliation(s)
- Hisashi Noma
- Department of Data Science, The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo, 190-8562, Japan.
| | | |
Collapse
|
15
|
Ginestet CE, Best NG, Richardson S. Classification loss function for parameter ensembles in Bayesian hierarchical models. Stat Probab Lett 2012. [DOI: 10.1016/j.spl.2011.12.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
16
|
Ginestet CE, Nichols TE, Bullmore ET, Simmons A. Brain network analysis: separating cost from topology using cost-integration. PLoS One 2011; 6:e21570. [PMID: 21829437 PMCID: PMC3145634 DOI: 10.1371/journal.pone.0021570] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2010] [Accepted: 06/04/2011] [Indexed: 11/18/2022] Open
Abstract
A statistically principled way of conducting brain network analysis is still lacking. Comparison of different populations of brain networks is hard because topology is inherently dependent on wiring cost, where cost is defined as the number of edges in an unweighted graph. In this paper, we evaluate the benefits and limitations associated with using cost-integrated topological metrics. Our focus is on comparing populations of weighted undirected graphs that differ in mean association weight, using global efficiency. Our key result shows that integrating over cost is equivalent to controlling for any monotonic transformation of the weight set of a weighted graph. That is, when integrating over cost, we eliminate the differences in topology that may be due to a monotonic transformation of the weight set. Our result holds for any unweighted topological measure, and for any choice of distribution over cost levels. Cost-integration is therefore helpful in disentangling differences in cost from differences in topology. By contrast, we show that the use of the weighted version of a topological metric is generally not a valid approach to this problem. Indeed, we prove that, under weak conditions, the use of the weighted version of global efficiency is equivalent to simply comparing weighted costs. Thus, we recommend the reporting of (i) differences in weighted costs and (ii) differences in cost-integrated topological measures with respect to different distributions over the cost domain. We demonstrate the application of these techniques in a re-analysis of an fMRI working memory task. We also provide a Monte Carlo method for approximating cost-integrated topological measures. Finally, we discuss the limitations of integrating topology over cost, which may pose problems when some weights are zero, when multiplicities exist in the ranks of the weights, and when one expects subtle cost-dependent topological differences, which could be masked by cost-integration.
Collapse
Affiliation(s)
- Cedric E Ginestet
- Department of Neuroimaging, Institute of Psychiatry, King's College London, London, United Kingdom.
| | | | | | | |
Collapse
|
17
|
Thall PF, Liu DD, Berrak SG, Wolff JE. Defining and ranking effects of individual agents based on survival times of cancer patients treated with combination chemotherapies. Stat Med 2011; 30:1777-94. [PMID: 21590700 DOI: 10.1002/sim.4249] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 02/28/2011] [Indexed: 11/07/2022]
Abstract
An important problem in oncology is comparing chemotherapy (chemo) agents in terms of their effects on survival or progression-free survival time. When the goal is to evaluate individual agents, a difficulty commonly encountered with observational data is that many patients receive a chemo combination including two or more agents. Because agents given in combination may interact, quantifying the contribution of each individual agent to the combination's overall effect is problematic. Still, if on average combinations including a particular agent confer longer survival, then that agent may be considered superior to agents whose combinations confer shorter survival. Motivated by this idea, we propose a definition of individual agent effects based on observational survival data from patients treated with many different chemo combinations. We define an individual agent effect as the average of the effects of the chemo combinations that include the agent. Similarly, we define the effect of each pair of agents as the average of the effects of the combinations including the pair. Under a Bayesian regression model for survival time in which the chemo combination effects follow a hierarchical structure, these definitions are used as a basis for estimating the posterior effects and ranks of the individual agents, and of all pairs of agents. The methods are illustrated by a data set arising from 224 pediatric brain tumor patients treated with over 27 different chemo combinations involving seven chemo agents.
Collapse
Affiliation(s)
- Peter F Thall
- Department of Biostatistics, M.D. Anderson Cancer Center, Houston, TX, U.S.A..
| | | | | | | |
Collapse
|
18
|
Li H, Graubard BI, Gail MH. Covariate Adjustment and Ranking Methods to Identify Regions with High and Low Mortality Rates. Biometrics 2010; 66:613-20. [DOI: 10.1111/j.1541-0420.2009.01284.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
Louis TA, Ruczinski I. Efficient evaluation of ranking procedures when the number of units is large, with application to SNP identification. Biom J 2010; 52:34-49. [PMID: 20131327 DOI: 10.1002/bimj.200900044] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Simulation-based assessment is a popular and frequently necessary approach for evaluating statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing various ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Collapse
Affiliation(s)
- Thomas A Louis
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| | | |
Collapse
|
20
|
Noma H, Matsui S, Omori T, Sato T. Bayesian ranking and selection methods using hierarchical mixture models in microarray studies. Biostatistics 2009; 11:281-9. [PMID: 19946026 DOI: 10.1093/biostatistics/kxp047] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The main purpose of microarray studies is screening to identify differentially expressed genes as candidates for further investigation. Because of limited resources in this stage, prioritizing or ranking genes is a relevant statistical task in microarray studies. In this article, we develop 3 empirical Bayes methods for gene ranking on the basis of differential expression, using hierarchical mixture models. These methods are based on (i) minimizing mean squared errors of estimation for parameters, (ii) minimizing mean squared errors of estimation for ranks of parameters, and (iii) maximizing sensitivity in selecting prespecified numbers of differential genes, with the largest effect. Our methods incorporate the mixture structures of differential and nondifferential components in empirical Bayes models to allow information borrowing across differential genes, with separation from nuisance, nondifferential genes. The accuracy of our ranking methods is compared with that of conventional methods through simulation studies. An application to a clinical study for breast cancer is provided.
Collapse
Affiliation(s)
- Hisashi Noma
- Department of Biostatistics, Kyoto University School of Public Health, Yoshida Konoe-cho, Sakyo-ku, Kyoto, Japan.
| | | | | | | |
Collapse
|
21
|
Louis TA. Discussion of Likelihood Inference for Models with Unobservables: Another View. Stat Sci 2009. [DOI: 10.1214/09-sts277a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
22
|
Xie M, Singh K, Zhang CH. Confidence Intervals for Population Ranks in the Presence of Ties and Near Ties. J Am Stat Assoc 2009. [DOI: 10.1198/jasa.2009.0142] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
23
|
Lin R, Louis TA, Paddock SM, Ridgeway G. Ranking USRDS provider specific SMRs from 1998-2001. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2008; 9:22-38. [PMID: 19343106 DOI: 10.1007/s10742-008-0040-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Provider profiling (ranking/percentiling) is prevalent in health services research. Bayesian models coupled with optimizing a loss function provide an effective framework for computing non-standard inferences such as ranks. Inferences depend on the posterior distribution and should be guided by inferential goals. However, even optimal methods might not lead to definitive results and ranks should be accompanied by valid uncertainty assessments. We outline the Bayesian approach and use estimated Standardized Mortality Ratios (SMRs) in 1998-2001 from the United States Renal Data System (USRDS) as a platform to identify issues and demonstrate approaches. Our analyses extend Liu et al. (2004) by computing estimates developed by Lin et al. (2006) that minimize errors in classifying providers above or below a percentile cut-point, by combining evidence over multiple years via a first-order, autoregressive model on log(SMR), and by use of a nonparametric prior. Results show that ranks/percentiles based on maximum likelihood estimates of the SMRs and those based on testing whether an SMR = 1 substantially under-perform the optimal estimates. Combining evidence over the four years using the autoregressive model reduces uncertainty, improving performance over percentiles based on only one year. Furthermore, percentiles based on posterior probabilities of exceeding a properly chosen SMR threshold are essentially identical to those produced by minimizing classification loss. Uncertainty measures effectively calibrate performance, showing that considerable uncertainty remains even when using optimal methods. Findings highlight the importance of using loss function guided percentiles and the necessity of accompanying estimates with uncertainty assessments.
Collapse
Affiliation(s)
- Rongheng Lin
- Department of Public Health, University of Massachusetts Amherst, Rm 411 Arnold House, 715 N. Pleasant Rd., Amherst, MA 01003, USA
| | | | | | | |
Collapse
|