1
|
Nan N, Tian L. A new accuracy metric under three classes when subclasses are involved and its confidence interval estimation. Stat Med 2023; 42:5207-5228. [PMID: 37779490 DOI: 10.1002/sim.9908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 07/26/2023] [Accepted: 09/04/2023] [Indexed: 10/03/2023]
Abstract
"Compound multi-class classification" refers to the setting where three or more main classes are involved and at least one of the main classes have multiple subclasses. A common practice in evaluating biomarker performance under "compound multi-class classification" is "subclasses pooling." In this article, we first explore the downsides of accuracy metrics based on pooled data. Then we propose a new accuracy measure proper for "compound multi-class classification" with three ordinal main classes, namely "volume under compoundR O C $$ ROC $$ surface (V U S C $$ VU{S}_C $$ )." The proposedV U S C $$ VU{S}_C $$ evaluates the accuracy of a biomarker appropriately by identifying main classes without requiring specification of an ordering for marker values of subclasses within each main class. For confidence interval estimation ofV U S C $$ VU{S}_C $$ , both parametric and nonparametric methods are studied, and simulation studies are carried out to assess coverage probabilities. A subset of Alzheimer's Disease Neuroimaging Initiative study dataset is analyzed.
Collapse
Affiliation(s)
- Nan Nan
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
2
|
Gao Y, Tian L. Interval estimation for the difference of two correlated gamma means: a generalized inference method and hybrid methods. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2046747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Yi Gao
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
3
|
Gao Y, Tian L. Confidence interval estimation for sensitivity and difference between two sensitivities at a given specificity under tree ordering. Stat Med 2021; 40:3695-3723. [PMID: 33906262 DOI: 10.1002/sim.8993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/24/2021] [Accepted: 04/01/2021] [Indexed: 11/07/2022]
Abstract
This article considers a setting in diagnostic studies (or biomarker study) which involves a healthy class and a diseased class and the latter consists of several subclasses. The problem of interest is to evaluate the accuracy of a biomarker (or a diagnostic test) measured on a continuous scale correctly identifying healthy subjects from diseased subjects without requiring specification of an ordering in terms of marker values for subclasses relative to each other within the diseased class. Such setting is quite common in practice and it falls in the framework of tree ordering or umbrella ordering. This article explores several parametric and nonparametric approaches for estimating confidence intervals of sensitivity of single biomarker and difference between sensitivities of two correlated biomarkers under tree ordering at a given specificity. The performances of all the methods are evaluated and compared by a comprehensive simulation study. A published microarray data set is analyzed using the proposed methods.
Collapse
Affiliation(s)
- Yi Gao
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
4
|
Mokalled SC, McMahan CS, Tebbs JM, Andrew Brown D, Bilder CR. Incorporating the dilution effect in group testing regression. Stat Med 2021; 40:2540-2555. [PMID: 33598950 DOI: 10.1002/sim.8916] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 11/25/2020] [Accepted: 02/03/2021] [Indexed: 11/10/2022]
Abstract
When screening for infectious diseases, group testing has proven to be a cost efficient alternative to individual level testing. Cost savings are realized by testing pools of individual specimens (eg, blood, urine, saliva, and so on) rather than by testing the specimens separately. However, a common concern that arises in group testing is the so-called "dilution effect." This occurs if the signal from a positive individual's specimen is diluted past an assay's threshold of detection when it is pooled with multiple negative specimens. In this article, we propose a new statistical framework for group testing data that merges estimation and case identification, which are often treated separately in the literature. Our approach considers analyzing continuous biomarker levels (eg, antibody levels, antigen concentrations, and so on) from pooled samples to estimate both a binary regression model for the probability of disease and the biomarker distributions for cases and controls. To increase case identification accuracy, we then show how estimates of the biomarker distributions can be used to select diagnostic thresholds on a pool-by-pool basis. Our proposals are evaluated through numerical studies and are illustrated using hepatitis B virus data collected on a prison population in Ireland.
Collapse
Affiliation(s)
- Stefani C Mokalled
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, USA
| | - Christopher S McMahan
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, USA
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Derek Andrew Brown
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
5
|
Saha-Chaudhuri P, Juwara L. Survival analysis under the Cox proportional hazards model with pooled covariates. Stat Med 2020; 40:998-1020. [PMID: 33210315 DOI: 10.1002/sim.8816] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 10/23/2020] [Accepted: 10/29/2020] [Indexed: 11/09/2022]
Abstract
For a continuous time-to-event outcome and an expensive-to-measure exposure, we develop a pooling design and propose a likelihood-based approach to estimate the hazard ratios (HRs) of a Cox proportional hazards (PH) model. Our proposed approach fits a PH model based on pooled exposures with individually observed time-to-event outcomes. The design and estimation exploits the equivalence of the conditional logistic likelihood functions arising from a matched case-control study and the partial likelihood function of a riskset-matched, nested case-control (NCC) subset of a cohort study. To create the pools, we first focus on an NCC subcohort. Pools are formed at random while keeping the matching intact. Pool-level exposure and confounders are then evaluated and used in the likelihood to estimate the HR and the standard error of the estimates. The estimators are MLEs, provide consistent estimates of the individual-level HRs, and are asymptotically normal. Our simulation results indicate that the pooled estimates are comparable to the estimates obtained from the NCC subcohort. The units of analysis for the pooled design are the pools and not the individual participants. Hence the effective sample size is reduced. Therefore, the variance of the HR estimate increases with increasing poolsize. However, this variance inflation in small samples can be offset by including more matched controls per case within the NCC subcohort. An application is demonstrated with the Second Manifestations of ARTerial disease (SMART) study.
Collapse
Affiliation(s)
| | - Lamin Juwara
- Quantitative Life Sciences Program, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
6
|
Harlow K, Ferreira CR, Sobreira TJP, Casey T, Stewart K. Lipidome profiles of postnatal day 2 vaginal swabs reflect fat composition of gilt's postnatal diet. PLoS One 2019; 14:e0215186. [PMID: 31557164 PMCID: PMC6762109 DOI: 10.1371/journal.pone.0215186] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 09/11/2019] [Indexed: 11/18/2022] Open
Abstract
We hypothesized that postnatal development of the vagina is impacted by early nutritional environment. Our objective was to determine if lipid profiles of vaginal swabs were different between postnatal gilts suckled by sow or fed milk replacer the first 48 h after birth, with or without a lard-based fat supplement. Gilts (>1.3 kg) were selected at birth across 8 litters and assigned to one of four treatments: 1) suckled by sow (S, n = 8); 2) suckled by sow plus administration of a fat supplement (SF, n = 5); 3) bottle-fed solely milk replacer (B, n = 8); or 4) bottle-fed solely milk replacer plus administration of a fat supplement (BF, n = 7). At 48 h postnatal, vaginal swabs of gilts were taken with a cytology brush, and lipids were extracted for analysis using multiple reaction monitoring (MRM)-profiling. Lipids extracted from serum collected at 48 h from gilts, milk collected at 24 h from sows, and milk replacer were also analyzed with MRM-profiling. Receiver operating characteristic curve analysis found 18 lipids recovered from vaginal swabs that highly distinguished between S and B gilts [area-under-the-curve (AUC) > 0.9], including phosphatidylethanolamine with 34 carbons and four unsaturations in the fatty acyl residues [PE (34:4)]. Twelve lipids from vaginal swabs highly correlated (r > 0.6; p < 0.01) with nutrition source. Lipids with greater abundance in milk replacer drove association. For example, mean intensity of PE (34:4) was 149-fold higher in milk replacer than colostrum. Consequently, PE (34:4) was found to have 1.6- and 2.12-fold higher levels in serum and vaginal swab samples (p < 0.001), respectively, of B gilts as compared to S gilts. Findings support that vaginal swabs can be used to noninvasively study effects of perinatal nutrition on tissue composition.
Collapse
Affiliation(s)
- KaLynn Harlow
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Christina R. Ferreira
- Metabolomics Core, Bindley Science Center, Purdue University, West Lafayette, Indiana, United States of America
| | - Tiago J. P. Sobreira
- Metabolomics Core, Bindley Science Center, Purdue University, West Lafayette, Indiana, United States of America
| | - Theresa Casey
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana, United States of America
- * E-mail:
| | - Kara Stewart
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana, United States of America
| |
Collapse
|
7
|
On the use of min-max combination of biomarkers to maximize the partial area under the ROC curve. JOURNAL OF PROBABILITY AND STATISTICS 2019; 2019. [PMID: 31057627 DOI: 10.1155/2019/8953530] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Evaluation of diagnostic assays and predictive performance of biomarkers based on the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are vital in diagnostic and targeted medicine. The partial area under the curve (pAUC) is an alternative metric focusing on a range of practical and clinical relevance of the diagnostic assay. In this article, we adopt and extend the min-max method to the estimation of the pAUC when multiple continuous scaled biomarkers are available and compare the performances of our proposed approach with existing approaches via simulations. Methods We conducted extensive simulation studies to investigate the performance of different methods for the combination of biomarkers based on their abilities to produce the largest pAUC estimates. Data were generated from different multivariate distributions with equal and unequal variance-covariance matrices. Different shapes of the ROC curves, false positive fraction ranges, and sample size configurations were considered. We obtained the mean and standard deviation of the pAUC estimates through re-substitution and leave-one-pair-out cross validation. Results Our results demonstrate that the proposed method provides the largest pAUC estimates under the following three important practical scenarios: (1) multivariate normally distributed data for non-diseased and diseased participants have unequal variance-covariance matrices; or (2) the ROC curves generated from individual biomarker are relative close regardless of the latent normality distributional assumption; or (3) the ROC curves generated from individual biomarker have straight-line shapes. Conclusions The proposed method is robust and investigators are encouraged to use this approach in the estimation of the pAUC for many practical scenarios.
Collapse
|
8
|
Zhang W, Liu A, Albert PS, Ashmead RD, Schisterman EF, Mills JL. A pooling strategy to effectively use genotype data in quantitative traits genome-wide association studies. Stat Med 2018; 37:4083-4095. [PMID: 30003569 DOI: 10.1002/sim.7898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Revised: 04/17/2018] [Accepted: 06/01/2018] [Indexed: 11/11/2022]
Abstract
The goal of quantitative traits genome-wide association studies is to identify associations between a phenotypic variable, such as a vitamin level and genetic variants, often single-nucleotide polymorphisms. When funding limits the number of assays that can be performed to measure the level of the phenotypic variable, a subgroup of subjects is often randomly selected from the genotype database and the level of the phenotypic variable is then measured for each subject. Because only a proportion of the genotype data can be used, such a simple random sampling method may suffer from substantial loss of efficiency, especially when the number of assays is relative small and the frequency of the less common variant (minor allele frequency) is low. We propose a pooling strategy in which subjects in a randomly selected reference subgroup are aligned with randomly selected subjects from the remaining study subjects to form independent pools; blood samples from subjects in each pool are mixed; and the level of the phenotypic variable is measured for each pool. We demonstrate that the proposed pooling approach produces considerable gains in efficiency over the simple random sampling method for inference concerning the phenotype-genotype association, resulting in higher precision and power. The methods are illustrated using genotypic and phenotypic data from the Trinity Students Study, a quantitative genome-wide association study.
Collapse
Affiliation(s)
- Wei Zhang
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Aiyi Liu
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Robert D Ashmead
- Center for Statistical Research and Methodology, US Census Bureau, Washington, District of Columbia
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - James L Mills
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
9
|
Affiliation(s)
- Juexin Lin
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
10
|
Liu Y, McMahan C, Gallagher C. A general framework for the regression analysis of pooled biomarker assessments. Stat Med 2017; 36:2363-2377. [PMID: 28349583 PMCID: PMC5484591 DOI: 10.1002/sim.7291] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Revised: 02/17/2017] [Accepted: 03/06/2017] [Indexed: 11/11/2022]
Abstract
As a cost-efficient data collection mechanism, the process of assaying pooled biospecimens is becoming increasingly common in epidemiological research; for example, pooling has been proposed for the purpose of evaluating the diagnostic efficacy of biological markers (biomarkers). To this end, several authors have proposed techniques that allow for the analysis of continuous pooled biomarker assessments. Regretfully, most of these techniques proceed under restrictive assumptions, are unable to account for the effects of measurement error, and fail to control for confounding variables. These limitations are understandably attributable to the complex structure that is inherent to measurements taken on pooled specimens. Consequently, in order to provide practitioners with the tools necessary to accurately and efficiently analyze pooled biomarker assessments, herein, a general Monte Carlo maximum likelihood-based procedure is presented. The proposed approach allows for the regression analysis of pooled data under practically all parametric models and can be used to directly account for the effects of measurement error. Through simulation, it is shown that the proposed approach can accurately and efficiently estimate all unknown parameters and is more computational efficient than existing techniques. This new methodology is further illustrated using monocyte chemotactic protein-1 data collected by the Collaborative Perinatal Project in an effort to assess the relationship between this chemokine and the risk of miscarriage. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Yan Liu
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Christopher McMahan
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Colin Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| |
Collapse
|
11
|
Mitchell EM, Plowden TC, Schisterman EF. Estimating relative risk of a log-transformed exposure measured in pools. Stat Med 2016; 35:5477-5494. [PMID: 27530506 DOI: 10.1002/sim.7075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 07/08/2016] [Accepted: 07/22/2016] [Indexed: 11/07/2022]
Abstract
Pooling biospecimens prior to performing laboratory assays is a useful tool to reduce costs, achieve minimum volume requirements and mitigate assay measurement error. When estimating the risk of a continuous, pooled exposure on a binary outcome, specialized statistical techniques are required. Current methods include a regression calibration approach, where the expectation of the individual-level exposure is calculated by adjusting the observed pooled measurement with additional covariate data. While this method employs a linear regression calibration model, we propose an alternative model that can accommodate log-linear relationships between the exposure and predictive covariates. The proposed model permits direct estimation of the relative risk associated with a log-transformation of an exposure measured in pools. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Emily M Mitchell
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| | - Torie C Plowden
- Program in Reproductive and Adult Endocrinology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| |
Collapse
|
12
|
Forbes TP, Najarro M. Ion mobility spectrometry nuisance alarm threshold analysis for illicit narcotics based on environmental background and a ROC-curve approach. Analyst 2016; 141:4438-46. [PMID: 27206280 PMCID: PMC5054301 DOI: 10.1039/c6an00844e] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The discriminative potential of an ion mobility spectrometer (IMS) for trace detection of illicit narcotics relative to environmental background was investigated with a receiver operating characteristic (ROC) curve framework. The IMS response of cocaine, heroin, methamphetamine, 3,4-methylenedioxymethamphetamine (MDMA), and Δ(9)-tetrahydro-cannabinol (THC) was evaluated against environmental background levels derived from the screening of incoming delivery vehicles at a federal facility. Over 20 000 samples were collected over a multiyear period under two distinct sets of instrument operating conditions, a baseline mode and an increased desorption/drift tube temperature and sampling time mode. ROC curves provided a quantifiable representation of the interplay between sensitivity (true positive rate, TPR) and specificity (1 - false positive rate, FPR). A TPR of 90% and minimized FPR were targeted as the detection limits of IMS for the selected narcotics. MDMA, THC, and cocaine demonstrated single nanogram sensitivity at 90% TPR and <10% FPR, with improvements to both MDMA and cocaine in the elevated temperature/increased sampling mode. Detection limits in the tens of nanograms with poor specificity (FPR ≈ 20%) were observed for methamphetamine and heroin under baseline conditions. However, elevating the temperature reduced the background in the methamphetamine window, drastically improving its response (90% TPR and 3.8% FPR at 1 ng). On the contrary, the altered mode conditions increased the level of background for THC and heroin, partially offsetting observed enhancements to desorption. The presented framework demonstrated the significant effect environmental background distributions have on sensitivity and specificity.
Collapse
Affiliation(s)
- Thomas P Forbes
- National Institute of Standards and Technology, Materials Measurement Science Division, Gaithersburg, MD, USA.
| | | |
Collapse
|
13
|
McMahan CS, McLain AC, Gallagher CM, Schisterman EF. Estimating covariate-adjusted measures of diagnostic accuracy based on pooled biomarker assessments. Biom J 2016; 58:944-61. [PMID: 26927583 DOI: 10.1002/bimj.201500195] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 12/31/2015] [Accepted: 01/06/2016] [Indexed: 11/10/2022]
Abstract
There is a need for epidemiological and medical researchers to identify new biomarkers (biological markers) that are useful in determining exposure levels and/or for the purposes of disease detection. Often this process is stunted by high testing costs associated with evaluating new biomarkers. Traditionally, biomarker assessments are individually tested within a target population. Pooling has been proposed to help alleviate the testing costs, where pools are formed by combining several individual specimens. Methods for using pooled biomarker assessments to estimate discriminatory ability have been developed. However, all these procedures have failed to acknowledge confounding factors. In this paper, we propose a regression methodology based on pooled biomarker measurements that allow the assessment of the discriminatory ability of a biomarker of interest. In particular, we develop covariate-adjusted estimators of the receiver-operating characteristic curve, the area under the curve, and Youden's index. We establish the asymptotic properties of these estimators and develop inferential techniques that allow one to assess whether a biomarker is a good discriminator between cases and controls, while controlling for confounders. The finite sample performance of the proposed methodology is illustrated through simulation. We apply our methods to analyze myocardial infarction (MI) data, with the goal of determining whether the pro-inflammatory cytokine interleukin-6 is a good predictor of MI after controlling for the subjects' cholesterol levels.
Collapse
Affiliation(s)
| | - Alexander C McLain
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC 29208, USA
| | - Colin M Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA
| | - Enrique F Schisterman
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA
| |
Collapse
|
14
|
Delaigle A, Zhou WX. Nonparametric and Parametric Estimators of Prevalence From Group Testing Data With Aggregated Covariates. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1054491] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
15
|
Matthews GJ, Harel O. Examining statistical disclosure issues involving digital images of ROC curves. Stat (Int Stat Inst) 2015. [DOI: 10.1002/sta4.93] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Gregory J. Matthews
- Department of Mathematics and Statistics; Loyola University Chicago; 1032 W. Sheridan Road Chicago IL 60660 USA
| | - Ofer Harel
- Department of Statistics; University of Connecticut; Room 323, Philip E. Austin Building, 215 Glenbrook Rd. U-4120 Storrs CT 06269-4120 USA
| |
Collapse
|
16
|
Wang D, McMahan CS, Gallagher CM. A general regression framework for group testing data, which incorporates pool dilution effects. Stat Med 2015; 34:3606-21. [PMID: 26173957 DOI: 10.1002/sim.6578] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 04/21/2015] [Accepted: 06/15/2015] [Indexed: 01/01/2023]
Abstract
Group testing, through the use of pooling, has been widely implemented as a more efficient means to screen individuals for infectious diseases. Typically, in these settings, practitioners are tasked with the complimentary goals of both case identification and estimation. For these purposes, many group testing strategies have been proposed, which address issues such as preserving anonymity in estimation studies, quality control, and classification. In general, these strategies require that a significant number of the individuals be retested, either in pools or individually. In order to provide practitioners with a general methodology that can be used to accurately and precisely analyze data of this form, herein, we propose a binary regression framework that can incorporate data arising from any group testing strategy. Further, we relax previously made assumptions regarding testing error rates by relating the diagnostic testing results to the latent biological marker levels of the individuals being tested. We investigate the finite sample performance of our proposed methodology through simulation and by applying our techniques to hepatitis B data collected as part of a study involving Irish prisoners.
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29028, U.S.A
| | | | - Colin M Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| |
Collapse
|
17
|
Mitchell EM, Lyles RH, Schisterman EF. Positing, fitting, and selecting regression models for pooled biomarker data. Stat Med 2015; 34:2544-58. [PMID: 25846980 DOI: 10.1002/sim.6496] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Revised: 02/18/2015] [Accepted: 03/13/2015] [Indexed: 01/31/2023]
Abstract
Pooling biospecimens prior to performing lab assays can help reduce lab costs, preserve specimens, and reduce information loss when subject to a limit of detection. Because many biomarkers measured in epidemiological studies are positive and right-skewed, proper analysis of pooled specimens requires special methods. In this paper, we develop and compare parametric regression models for skewed outcome data subject to pooling, including a novel parameterization of the gamma distribution that takes full advantage of the gamma summation property. We also develop a Monte Carlo approximation of Akaike's Information Criterion applied to pooled data in order to guide model selection. Simulation studies and analysis of motivating data from the Collaborative Perinatal Project suggest that using Akaike's Information Criterion to select the best parametric model can help ensure valid inference and promote estimate precision.
Collapse
Affiliation(s)
- Emily M Mitchell
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, MD, U.S.A
| | - Robert H Lyles
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, 30322, GA, U.S.A
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, MD, U.S.A
| |
Collapse
|
18
|
Sandhu SK, Halpern CH, Vakhshori V, Mirsaeedi-Farahani K, Farrar JT, Lee JYK. Brief Pain Inventory–Facial minimum clinically important difference. J Neurosurg 2015; 122:180-90. [DOI: 10.3171/2014.8.jns132547] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
OBJECT
Neurosurgeons are frequently the primary physicians measuring pain relief in patients with trigeminal neuralgia (TN). Unfortunately, the measurement of pain can be complex. The Brief Pain Inventory–Facial (BPI-Facial) is a reliable and validated multidimensional tool that consists of 18 questions. It measures 3 domains of pain: 1) pain intensity (worst and average pain intensity), 2) interference with general activities of daily living (ADL), and 3) face-specific pain interference. The objective of this paper is to determine the patient-reported minimum clinically important difference (MCID) using the BPI-Facial.
METHODS
The authors conducted a retrospective study of 234 patients with TN seen in a single neurosurgeon's office. Patients completed baseline and 1-month follow-up BPI-Facial questionnaires. The MCID was calculated using an anchor-based approach in which the defined anchor was the 7-point patient global impression of change (PGIC). Two statistical methods were employed: mean change score and optimal cutoff point.
RESULTS
Using the mean change score method, the investigators calculated the MCID for the 3 domains of the BPIFacial: 44% and 30% improvement in pain intensity at its worst and average, respectively, 54% improvement in interference with general ADL, and 63% improvement in interference with facial ADL. Using the optimal cutoff point method, they also calculated the MCID for the 3 domains of the BPI-Facial: 57% and 28% improvement in pain intensity at its worst and average, respectively, 75% improvement in interference with general ADL, and 62% improvement in interference with facial ADL.
CONCLUSIONS
The BPI-Facial is a multidimensional pain scale that measures 3 domains of pain. Although 2 statistical methods were used to calculate the MCID, the optimal cutoff point method was the superior one because it used data from the majority of subjects included in this study. A 57% improvement in pain intensity at its worst and a 28% improvement in pain intensity at its average were the MCIDs for patients with facial pain. A greater improvement was needed to achieve the MCID for interference with general and facial ADL. A 75% improvement in interference with general ADL and a 62% improvement in interference with facial ADL were needed to achieve an MCID. While pain intensity is easier to measure, pain's interference with ADL may be more important for patient outcomes when designing or evaluating interventions in the field of TN. The BPI-Facial is a useful instrument to measure changes in multidimensional aspects of pain in patients with TN.
Collapse
Affiliation(s)
| | | | | | | | - John T. Farrar
- 2Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, Pennsylvania
| | | |
Collapse
|
19
|
Vexler A, Tao G, Chen X. A toolkit for clinical statisticians to fix problems based on biomarker measurements subject to instrumental limitations: from repeated measurement techniques to a hybrid pooled-unpooled design. Methods Mol Biol 2015; 1208:439-60. [PMID: 25323525 DOI: 10.1007/978-1-4939-1441-8_31] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The aim of this chapter is to review and examine different methods in order to display correct and efficient statistical techniques based on complete/incomplete data subject to different sorts of measurement error (ME) problems. Instrument inaccuracies, biological variations, and/or errors in questionnaire-based self-report data can lead to significant MEs in various clinical experiments. Ignoring MEs can cause bias or inconsistency of statistical inferences. The biostatistical literature well addresses two categories of MEs: errors related to additive models and errors caused by the limit of detection (LOD). Several statistical approaches have been developed to analyze data affected by MEs, including the parametric/nonparametric likelihood methodologies, Bayesian methods, the single and multiple imputation techniques, and the repeated measurement design of experiment. We present a novel hybrid pooled-unpooled design as one of the strategies to provide correct statistical inferences when data is subject to MEs. This hybrid design and the classical techniques are compared to show the advantages and disadvantages of the considered methods.
Collapse
Affiliation(s)
- Albert Vexler
- Department of Biostatistics, New York State University at Buffalo, 715 Kimball Tower, 3435 Main Street, Buffalo, NY, 14214, USA,
| | | | | |
Collapse
|
20
|
Heffernan AL, Aylward LL, Toms LML, Sly PD, Macleod M, Mueller JF. Pooled biological specimens for human biomonitoring of environmental chemicals: opportunities and limitations. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2014; 24:225-32. [PMID: 24192659 DOI: 10.1038/jes.2013.76] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 08/31/2013] [Indexed: 05/03/2023]
Abstract
Biomonitoring has become the "gold standard" in assessing chemical exposures, and has an important role in risk assessment. The pooling of biological specimens-combining multiple individual specimens into a single sample-can be used in biomonitoring studies to monitor levels of exposure and identify exposure trends or to identify susceptible populations in a cost-effective manner. Pooled samples provide an estimate of central tendency and may also reveal information about variation within the population. The development of a pooling strategy requires careful consideration of the type and number of samples collected, the number of pools required and the number of specimens to combine per pool in order to maximise the type and robustness of the data. Creative pooling strategies can be used to explore exposure-outcome associations, and extrapolation from other larger studies can be useful in identifying elevated exposures in specific individuals. The use of pooled specimens is advantageous as it saves significantly on analytical costs, may reduce the time and resources required for recruitment and, in certain circumstances, allows quantification of samples approaching the limit of detection. In addition, the use of pooled samples can provide population estimates while avoiding ethical difficulties that may be associated with reporting individual results.
Collapse
Affiliation(s)
- Amy L Heffernan
- National Research Centre for Environmental Toxicology (Entox), University of Queensland, Brisbane, Queensland, Australia
| | | | - Leisa-Maree L Toms
- School of Clinical Sciences and Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Peter D Sly
- Children's Health and Environment Program, Queensland Children's Medical Research Institute, University of Queensland, Herston, Queensland, Australia
| | - Matthew Macleod
- Department of Applied Environmental Science, Stockholm University, Stockholm, Sweden
| | - Jochen F Mueller
- National Research Centre for Environmental Toxicology (Entox), University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
21
|
Saha-Chaudhuri P, Weinberg CR. Specimen pooling for efficient use of biospecimens in studies of time to a common event. Am J Epidemiol 2013; 178:126-35. [PMID: 23821316 DOI: 10.1093/aje/kws442] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
For case-control studies that rely on expensive assays for biomarkers, specimen pooling offers a cost-effective and efficient way to estimate individual-level odds ratios. Pooling helps to conserve irreplaceable biospecimens for the future, mitigates limit-of-detection problems, and enables inclusion of individuals who have limited available volumes of biospecimen. Pooling can also allow the study of a panel of biomarkers under a fixed assay budget. Here, we extend this method for application to discrete-time survival studies. Assuming a proportional odds logistic model for risk of a common outcome, we propose a design strategy that forms pooling sets within those experiencing the outcome at the same event time. We show that the proposed design enables a cost-effective analysis to assess the association of a biomarker with the outcome. Because the standard likelihood is slightly misspecified for the proposed pooling strategy under a nonnull biomarker effect, the proposed approach produces slightly biased estimates of exposure odds ratios. We explore the extent of this bias via simulations and illustrate the method by revisiting a data set relating polychlorinated biphenyls and 1,1-dichloro-2,2-bis(p-chlorophenyl)ethylene to time to pregnancy.
Collapse
Affiliation(s)
- Paramita Saha-Chaudhuri
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA.
| | | |
Collapse
|
22
|
Symmetry Properties of Bi-Normal and Bi-Gamma Receiver Operating Characteristic Curves are Described by Kullback-Leibler Divergences. ENTROPY 2013. [DOI: 10.3390/e15041342] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
23
|
Affiliation(s)
- Enrique F Schisterman
- Epidemiology Branch, Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, 6100 Executive Boulevard Room 7B03, Bethesda, MD 20892-7510, USA.
| | | |
Collapse
|
24
|
Danaher MR, Schisterman EF, Roy A, Albert PS. Estimation of gene-environment interaction by pooling biospecimens. Stat Med 2012; 31:3241-52. [PMID: 22859290 DOI: 10.1002/sim.5357] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Accepted: 02/08/2012] [Indexed: 11/09/2022]
Abstract
Case-control studies are prone to low power for testing gene-environment interactions (GXE) given the need for a sufficient number of individuals on each strata of disease, gene, and environment. We propose a new study design to increase power by strategically pooling biospecimens. Pooling biospecimens allows us to increase the number of subjects significantly, thereby providing substantial increase in power. We focus on a special, although realistic case, where disease and environmental statuses are binary, and gene status is ordinal with each individual having 0, 1, or 2 minor alleles. Through pooling, we obtain an allele frequency for each level of disease and environmental status. Using the allele frequencies, we develop a new methodology for estimating and testing GXE that is comparable to the situation when we have complete data on gene status for each individual. We also explore the measurement process and its effect on the GXE estimator. Using an illustration, we show the effectiveness of pooling with an epidemiologic study, which tests an interaction for fiber and paraoxonase on anovulation. Through simulation, we show that taking 12 pooled measurements from 1000 individuals achieves more power than individually genotyping 500 individuals. Our findings suggest that strategic pooling should be considered when an investigator designs a pilot study to test for a GXE.
Collapse
Affiliation(s)
- M R Danaher
- Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, MD, U.S.A
| | | | | | | |
Collapse
|
25
|
Whitcomb BW, Perkins NJ, Zhang Z, Ye A, Lyles RH. Assessment of skewed exposure in case-control studies with pooling. Stat Med 2012; 31:2461-72. [PMID: 22437722 DOI: 10.1002/sim.5351] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2011] [Accepted: 01/23/2012] [Indexed: 12/31/2022]
Abstract
Pooling-based strategies that combine samples from multiple participants for laboratory assays have been proposed for epidemiologic investigations of biomarkers to address issues including cost, efficiency, detection, and when minimal sample volume is available. A modification of the standard logistic regression model has been previously described to allow use with pooled data; however, this model makes assumptions regarding exposure distribution and logit-linearity of risk (i.e., constant odds ratio) that can be violated in practice. We were motivated by a nested case-control study of miscarriage and inflammatory factors with highly skewed distributions to develop a more flexible model for analysis of pooled data. Using characteristics of the gamma distribution and the relation between models of binary outcome conditional on exposure and of exposure conditional on outcome, we use a modified logistic regression to accommodate nonlinearity because of unequal shape parameters in gamma distributed exposure for cases and controls. Using simulations, we compare our approach with existing methods for logistic regression for pooled data considering: (1) constant and dose-dependent effects; (2) gamma and log-normal distributed exposure; (3) effect size; and (4) the proportions of biospecimens pooled. We show that our approach allows estimation of odds ratios that vary with exposure level, yet has minimal loss of efficiency compared with existing approaches when exposure effects are dose-invariant. Our model performed similarly to a maximum likelihood estimation approach in terms of bias and efficiency, and provides an easily implemented approach for estimation with pooled biomarker data when effects may not be constant across exposure.
Collapse
Affiliation(s)
- Brian W Whitcomb
- Division of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts Amherst, Amherst, MA, USA.
| | | | | | | | | |
Collapse
|
26
|
Ma CX, Vexler A, Schisterman EF, Tian L. Cost-efficient designs based on linearly associated biomarkers. J Appl Stat 2011. [DOI: 10.1080/02664763.2011.567254] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
27
|
Schisterman EF, Vexler A, Ye A, Perkins NJ. A combined efficient design for biomarker data subject to a limit of detection due to measuring instrument sensitivity. Ann Appl Stat 2011. [DOI: 10.1214/11-aoas490] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
28
|
Malinovsky Y, Albert PS, Schisterman EF. Pooling designs for outcomes under a Gaussian random effects model. Biometrics 2011; 68:45-52. [PMID: 21981372 DOI: 10.1111/j.1541-0420.2011.01673.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Due to the rising cost of laboratory assays, it has become increasingly common in epidemiological studies to pool biospecimens. This is particularly true in longitudinal studies, where the cost of performing multiple assays over time can be prohibitive. In this article, we consider the problem of estimating the parameters of a Gaussian random effects model when the repeated outcome is subject to pooling. We consider different pooling designs for the efficient maximum likelihood estimation of variance components, with particular attention to estimating the intraclass correlation coefficient. We evaluate the efficiencies of different pooling design strategies using analytic and simulation study results. We examine the robustness of the designs to skewed distributions and consider unbalanced designs. The design methodology is illustrated with a longitudinal study of premenopausal women focusing on assessing the reproducibility of F2-isoprostane, a biomarker of oxidative stress, over the menstrual cycle.
Collapse
Affiliation(s)
- Yaakov Malinovsky
- Division of Epidemiology, Statistics, and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland 20892, USA
| | | | | |
Collapse
|
29
|
Zhang Z, Liu A, Lyles RH, Mukherjee B. Logistic regression analysis of biomarker data subject to pooling and dichotomization. Stat Med 2011; 31:2473-84. [PMID: 21953741 DOI: 10.1002/sim.4367] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2011] [Revised: 07/11/2011] [Accepted: 07/26/2011] [Indexed: 11/07/2022]
Abstract
There is growing interest in pooling specimens across subjects in epidemiologic studies, especially those involving biomarkers. This paper is concerned with regression analysis of epidemiologic data where a binary exposure is subject to pooling and the pooled measurement is dichotomized to indicate either that no subjects in the pool are exposed or that some are exposed, without revealing further information about the exposed subjects in the latter case. The pooling process may be stratified on the disease status (a binary outcome) and possibly other variables but is otherwise assumed random. We propose methods for estimating parameters in a prospective logistic regression model and illustrate these with data from a population-based case-control study of colorectal cancer. Simulation results show that the proposed methods perform reasonably well in realistic settings and that pooling can lead to sizable gains in cost efficiency. We make recommendations with regard to the choice of design for pooled epidemiologic studies.
Collapse
Affiliation(s)
- Z Zhang
- Biostatistics and Bioinformatics Branch, Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-7510, USA.
| | | | | | | |
Collapse
|
30
|
Vexler A, Tsai WM, Malinovsky Y. Estimation and testing based on data subject to measurement errors: from parametric to non-parametric likelihood methods. Stat Med 2011; 31:2498-512. [PMID: 21805485 DOI: 10.1002/sim.4304] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2011] [Accepted: 04/29/2011] [Indexed: 11/07/2022]
Abstract
Measurement error (ME) problems can cause bias or inconsistency of statistical inferences. When investigators are unable to obtain correct measurements of biological assays, special techniques to quantify MEs need to be applied. Sampling based on repeated measurements is a common strategy to allow for ME. This method has been well addressed in the literature under parametric assumptions. The approach with repeated measures data may not be applicable when the replications are complicated because of cost and/or time concerns. Pooling designs have been proposed as cost-efficient sampling procedures that can assist to provide correct statistical operations based on data subject to ME. We demonstrate that a mixture of both pooled and unpooled data (a hybrid pooled-unpooled design) can support very efficient estimation and testing in the presence of ME. Nonparametric techniques have not been well investigated to analyze repeated measures data or pooled data subject to ME. We propose and examine both the parametric and empirical likelihood methodologies for data subject to ME. We conclude that the likelihood methods based on the hybrid samples are very efficient and powerful. The results of an extensive Monte Carlo study support our conclusions. Real data examples demonstrate the efficiency of the proposed methods in practice.
Collapse
Affiliation(s)
- Albert Vexler
- Department of Biostatistics, The State University of New York, Buffalo, NY 14214, USA.
| | | | | |
Collapse
|
31
|
Abstract
It has become increasingly common in epidemiological studies to pool specimens across subjects to achieve accurate quantitation of biomarkers and certain environmental chemicals. In this article, we consider the problem of fitting a binary regression model when an important exposure is subject to pooling. We take a regression calibration approach and derive several methods, including plug-in methods that use a pooled measurement and other covariate information to predict the exposure level of an individual subject, and normality-based methods that make further adjustments by assuming normality of calibration errors. Within each class we propose two ways to perform the calibration (covariate augmentation and imputation). These methods are shown in simulation experiments to effectively reduce the bias associated with the naive method that simply substitutes a pooled measurement for all individual measurements in the pool. In particular, the normality-based imputation method performs reasonably well in a variety of settings, even under skewed distributions of calibration errors. The methods are illustrated using data from the Collaborative Perinatal Project.
Collapse
Affiliation(s)
- Zhiwei Zhang
- Biostatistics and Bioinformatics Branch, Division of Epidemiology, Statistics, and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland 20892, USA.
| | | |
Collapse
|
32
|
|
33
|
Heine JJ, Land WH, Egan KM. Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression. BMC Bioinformatics 2011; 12:37. [PMID: 21272346 PMCID: PMC3045299 DOI: 10.1186/1471-2105-12-37] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Accepted: 01/27/2011] [Indexed: 11/16/2022] Open
Abstract
Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL) techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR) modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
Collapse
Affiliation(s)
- John J Heine
- H, Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA.
| | | | | |
Collapse
|
34
|
Jafarzadeh SR, Johnson WO, Utts JM, Gardner IA. Bayesian estimation of the receiver operating characteristic curve for a diagnostic test with a limit of detection in the absence of a gold standard. Stat Med 2010; 29:2090-106. [DOI: 10.1002/sim.3975] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
35
|
Schisterman EF, Vexler A, Mumford SL, Perkins NJ. Hybrid pooled-unpooled design for cost-efficient measurement of biomarkers. Stat Med 2010; 29:597-613. [PMID: 20049693 DOI: 10.1002/sim.3823] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Evaluating biomarkers in epidemiological studies can be expensive and time consuming. Many investigators use techniques such as random sampling or pooling biospecimens in order to cut costs and save time on experiments. Commonly, analyses based on pooled data are strongly restricted by distributional assumptions that are challenging to validate because of the pooled biospecimens. Random sampling provides data that can be easily analyzed. However, random sampling methods are not optimal cost-efficient designs for estimating means. We propose and examine a cost-efficient hybrid design that involves taking a sample of both pooled and unpooled data in an optimal proportion in order to efficiently estimate the unknown parameters of the biomarker distribution. In addition, we find that this design can be used to estimate and account for different types of measurement and pooling error, without the need to collect validation data or repeated measurements. We show an example where application of the hybrid design leads to minimization of a given loss function based on variances of the estimators of the unknown parameters. Monte Carlo simulation and biomarker data from a study on coronary heart disease are used to demonstrate the proposed methodology.
Collapse
Affiliation(s)
- Enrique F Schisterman
- Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH/DHHS, Rockville, MD 20852, USA.
| | | | | | | |
Collapse
|
36
|
Vexler A, Liu A, Schisterman E. Nonparametric deconvolution of density estimation based on observed sums. J Nonparametr Stat 2010. [DOI: 10.1080/10485250903094286] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
37
|
Cardell LO, Andersson M, Cervin A, Davidsson A, Hellgren J, Holmström M, Lundblad L, Stierna P, Stjärne P, Adner M. Genes regulating molecular and cellular functions in noninfectious nonallergic rhinitis. Allergy 2009; 64:1301-8. [PMID: 19432938 DOI: 10.1111/j.1398-9995.2009.02009.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
BACKGROUND Chronic noninfectious, nonallergic rhinitis (NINAR) is a complex syndrome with a principally unknown pathophysiology. New technology has made it possible to examine differentially expressed genes and according to network theory, genes connected by their function that might have key roles in the disease. METHODS Connectivity analysis was used to identify NINAR key genes. mRNA was extracted from nasal biopsies from 12 NINAR patients and 12 healthy volunteers. Microarrays were performed using Affymetrix chips with 54 613 genes. Data were analysed with the Ingenuity Pathway System for organization of genes into annotated biological functions and, thereafter, linking genes into networks due to their connectivity. The regulation of key genes was confirmed with reverse transcription-polymerase chain reaction (RT-PCR). RESULTS In all, 43 genes were differentially expressed. The functional analysis showed that these genes were primarily involved in cellular movement, haematological system development and immune response. Merging these functions, 10 genes were found to be shared. Network analysis generated three networks and two of these 'shared genes' in key positions, c-fos and cell division cycle 42 (Cdc42). These genes were upregulated in both the array and the RT-PCR analysis. CONCLUSION Ten genes were found to be of pathophysiological interest for NINAR and of these, c-fos and Cdc42 seemed to be of specific interest due to their ability to interact with other genes of interest within this context. Although the role of c-fos and Cdc42 in upper airway inflammation remains unknown, they might be used as potential disease markers.
Collapse
|
38
|
Ye X, Pierik FH, Angerer J, Meltzer HM, Jaddoe VWV, Tiemeier H, Hoppin JA, Longnecker MP. Levels of metabolites of organophosphate pesticides, phthalates, and bisphenol A in pooled urine specimens from pregnant women participating in the Norwegian Mother and Child Cohort Study (MoBa). Int J Hyg Environ Health 2009; 212:481-91. [PMID: 19394271 DOI: 10.1016/j.ijheh.2009.03.004] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Revised: 03/17/2009] [Accepted: 03/18/2009] [Indexed: 11/16/2022]
Abstract
Concerns about reproductive and developmental health risks of exposure to organophosphate (OP) pesticides, phthalates, and bisphenol A (BPA) among the general population are increasing. Six dialkyl phosphate (DAP) metabolites, 3,5,6-trichloro-2-pyridinol (TCPy), BPA, and fourteen phthalate metabolites were measured in 10 pooled urine samples representing 110 pregnant women who participated in the Norwegian Mother and Child Birth Cohort (MoBa) study in 2004. Daily intakes were estimated from urinary data and compared with reference doses (RfDs) and daily tolerable intakes (TDIs). The MoBa women had a higher mean BPA concentration (4.50 microg/L) than the pregnant women in the Generation R Study (Generation R) in the Netherlands and the National Health and Nutrition Examination Survey (NHANES) in the United States. The mean concentration of total DAP metabolites (24.20 microg/L) in MoBa women was higher than that in NHANES women but lower than that in Generation R women. The diethyl phthalate metabolite mono-ethyl phthalate (MEP) was the dominant phthalate metabolite in all three studies, with the mean concentrations of greater than 300 microg/L. The MoBa and Generation R women had higher mean concentrations of mono-n-butyl phthalate (MnBP) and mono-isobutyl phthalate (MiBP) than the NHANES women. The estimated average daily intakes of BPA, chlorpyrifos/chlorpyrifos-methyl and phthalates in MoBa (and the other two studies) were below the RfDs and TDIs. The higher levels of metabolites in the MoBa participants may have been from intake via pesticide residues in food (organophosphates), consumption of canned food, especially fish/seafood (BPA), and use of personal care products (selected phthalates).
Collapse
Affiliation(s)
- Xibiao Ye
- Epidemiology Branch, National Institute of Environmental Health Sciences (NIEHS), National Institutes of Health (NIH), Department of Health and Human Services (DHHS), MD A3-05, PO Box 12233, Research Triangle Park, NC 27709, USA
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Abstract
The use of biomarkers to assess exposure and investigate biomedical questions is common in epidemiology. The usefulness of biomarker research, however, is contingent upon the ability to achieve a complete understanding of the role they play within a population. In estimating distributional parameters for a particular biomarker, such as oxidative stress or antioxidant markers, scientists face two main challenges: overcoming the cost of performing a large number of assays and dealing with data subject to a limit of detection. While approaches have been suggested to deal with each of these issues individually, pooling is a strategy that can address both problems.
Collapse
|
40
|
Schisterman EF, Vexler A. To pool or not to pool, from whether to when: applications of pooling to biospecimens subject to a limit of detection. Paediatr Perinat Epidemiol 2008; 22:486-96. [PMID: 18782255 PMCID: PMC2749284 DOI: 10.1111/j.1365-3016.2008.00956.x] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Pooling of biological specimens has been utilised as a cost-efficient sampling strategy, but cost is not the unique limiting factor in biomarker development and evaluation. We examine the effect of different sampling strategies of biospecimens for exposure assessment that cannot be detected below a detection threshold (DT). The paper compares use of pooled samples to a randomly selected sample from a cohort in order to evaluate the efficiency of parameter estimates. The proposed approach shows that a pooling design is more efficient than a random sample strategy under certain circumstances. Moreover, because pooling minimises the amount of information lost below the DT, the use of pooled data is preferable (in a context of a parametric estimation) to using all available individual measurements, for certain values of the DT. We propose a combined design, which applies pooled and unpooled biospecimens, in order to capture the strengths of the different sampling strategies and overcome instrument limitations (i.e. DT). Several Monte Carlo simulations and an example based on actual biomarker data illustrate the results of the article.
Collapse
Affiliation(s)
- Enrique F Schisterman
- Division of Epidemiology, Statistics and Prevention Research, National Institute of Child Health and Human Development, National Institutes of Health, Rockville, MD 20852, USA.
| | | |
Collapse
|
41
|
Vexler A, Schisterman EF, Liu A. Estimation of ROC curves based on stably distributed biomarkers subject to measurement error and pooling mixtures. Stat Med 2008; 27:280-96. [PMID: 17721905 PMCID: PMC2761639 DOI: 10.1002/sim.3035] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Additive measurement errors and pooling design are objectively two different issues, which have been separately and extensively dealt with in the biostatistics literature. However, these topics usually correspond to problems of reconstructing a summand's distribution of the biomarker by the distribution of the convoluted observations. Thus, we associate the two issues into one stated problem. The integrated approach creates an opportunity to investigate new fields, e.g. a subject of pooling errors, issues regarding pooled data affected by measurement errors. To be specific, we consider the stated problem in the context of the receiver operating characteristic (ROC) curves analysis, which is the well-accepted tool for evaluating the ability of a biomarker to discriminate between two populations. The present paper considers a wide family of biospecimen distributions. In addition, applied assumptions, which are related to distribution functions of biomarkers, are mainly conditioned by the reconstructing problem. We propose and examine maximum likelihood techniques based on the following data: a biomarker with measurement error; pooled samples; and pooled samples with measurement error. The obtained methods are illustrated by applications to real data studies.
Collapse
Affiliation(s)
- Albert Vexler
- National Institute of Child Health and Human Development, USA
| | | | | |
Collapse
|
42
|
Bondell HD, Liu A, Schisterman EF. Statistical Inference Based on Pooled Data: A Moment-Based Estimating Equation Approach. J Appl Stat 2007. [DOI: 10.1080/02664760600994844] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
43
|
Skates SJ, Horick NK, Moy JM, Minihan AM, Seiden MV, Marks JR, Sluss P, Cramer DW. Pooling of Case Specimens to Create Standard Serum Sets for Screening Cancer Biomarkers. Cancer Epidemiol Biomarkers Prev 2007; 16:334-41. [PMID: 17301268 DOI: 10.1158/1055-9965.epi-06-0681] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Multiple identical sets of sera from cancer cases and controls would facilitate standardized testing of biomarkers. We describe the creation and use of standard serum sets developed from healthy donors and pooled sera from ovarian, breast, and endometrial cancer cases. METHODS Two hundred seventy-five 0.3-mL aliquots of sera were created for each of the 95 healthy women, and residual serum was pooled to create 275 identical sets of 20 0.3-mL aliquots. Aliquots (1.0-1.5 mL) from 441 women were combined to create 12 breast and pelvic disease pools with at least 115 0.3-mL aliquots. Sets were assembled to contain aliquots from individual controls, replicates, and disease pools. Cancer antigens (CA), CA 125, CA 19.9, and CA 15.3, and carcinoembryonic antigen were measured in one set and in 217 women comprising six of the pelvic disease pools. Use of a set was illustrated for mesothelin (soluble mesothelin-related protein). Statistical output included concentration differences between pooled cases and controls (z values for single analytes; Mahalanobis distances for pairs), correlation between z values and sensitivities, coefficient of variations, and standardized biases. RESULTS Marker concentrations in the six pelvic disease pools were generally within 0.25 SD of the actual average, and z values correlated well with sensitivities. CA 125 remains the best single marker for nonmucinous ovarian cancer, complemented by CA 15.3 or soluble mesothelin-related protein. There is no comparable breast cancer biomarker among the current analytes tested. CONCLUSION The potential value of standard serum sets for initial assessment of candidate biomarkers is illustrated. Sets are now available through the Early Detection Research Network to evaluate biomarkers for women's cancers.
Collapse
Affiliation(s)
- Steven J Skates
- Biostatistics Center, Massachusetts General Hospital, Boston, USA
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Vexler A, Liu A, Schisterman EF. Efficient Design and Analysis of Biospecimens with Measurements Subject to Detection Limit. Biom J 2006; 48:780-91. [PMID: 17094343 DOI: 10.1002/bimj.200610266] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Pooling biospecimens is a well accepted sampling strategy in biomedical research to reduce study cost of measuring biomarkers, and has been shown in the case of normally distributed data to yield more efficient estimation. In this paper we examine the efficiency of pooling, in the context of information matrix related to estimators of unknown parameters, when the biospecimens being pooled yield incomplete observations due to the instruments' limit of detection. Our investigation of three sampling strategies shows that, for a range of values of the detection limit, pooling is the most efficient sampling procedure. For certain other values of the detection limit, pooling can perform poorly.
Collapse
Affiliation(s)
- Albert Vexler
- Division of Epidemiology, Statistics and Prevention Research, National Institute of Child Health and Human Development, NIH/DHHS, 6100 Executive Blvd., Rockville, MD 20852, USA.
| | | | | |
Collapse
|
45
|
Schisterman EF, Perkins NJ, Liu A, Bondell H. Optimal cut-point and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology 2005; 16:73-81. [PMID: 15613948 DOI: 10.1097/01.ede.0000147512.81966.ba] [Citation(s) in RCA: 805] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Costs can hamper the evaluation of the effectiveness of new biomarkers. Analysis of smaller numbers of pooled specimens has been shown to be a useful cost-cutting technique. The Youden index (J), a function of sensitivity (q) and specificity (p), is a commonly used measure of overall diagnostic effectiveness. More importantly, J is the maximum vertical distance or difference between the ROC curve and the diagonal or chance line; it occurs at the cut-point that optimizes the biomarker's differentiating ability when equal weight is given to sensitivity and specificity. Using the additive property of the gamma and normal distributions, we present a method to estimate the Youden index and the optimal cut-point, and extend its applications to pooled samples. We study the effect of pooling when only a fixed number of individuals are available for testing, and pooling is carried out to save on the number of assays. We measure loss of information by the change in root mean squared error of the estimates of the optimal cut-point and the Youden index, and we study the extent of this loss via a simulation study. In conclusion, pooling can result in a substantial cost reduction while preserving the effectiveness of estimators, especially when the pool size is not very large.
Collapse
Affiliation(s)
- Enrique F Schisterman
- Division of Epidemiology, Statistics and Prevention Research, National Institute of Child Health and Human Development, National Institutes of Health, DHHS, Bethesda, Maryland, USA.
| | | | | | | |
Collapse
|