1
|
Chung Y, Cai T, Newcomb L, Lin DW, Zheng Y. Improving Efficiency and Robustness of the Prognostic Accuracy of Biomarkers With Partial Incomplete Failure-Time Data and Auxiliary Outcome: Application to Prostate Cancer Active Surveillance Study. Stat Med 2025; 44:e70072. [PMID: 40277348 DOI: 10.1002/sim.70072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/19/2024] [Accepted: 03/10/2025] [Indexed: 04/26/2025]
Abstract
When novel biomarkers are developed for the clinical management of patients diagnosed with cancer, it is critical to quantify the accuracy of a biomarker-based decision tool. The evaluation can be challenging when the definite outcomeT $$ T $$ , such as time to disease progression, is only partially ascertained on a limited set of study patients. Under settings whereT $$ T $$ is only observed on a subset but an auxiliary outcome correlated withT $$ T $$ is available on all subjects, we propose an augmented estimation procedure for commonly used time-dependent accuracy measures. The augmented estimators are easy to implement without imposing modeling assumptions between the two types of time-to-event outcomes and are more efficient than the complete-case estimator. When the ascertainment of the outcome is non-random and subject to informative censoring, we further augment our proposed method with inverse probability weighting to improve robustness. Results from simulation studies confirm the robustness and efficiency properties of the proposed estimators. The method is illustrated with data from the Canary Prostate Active Surveillance Study.
Collapse
Affiliation(s)
- Yunro Chung
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
- Biodesign Center for Personalized Diagnostics, Arizona State University, Tempe, AZ, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Lisa Newcomb
- Department of Urology, University of Washington, Seattle, WA, USA
| | - Daniel W Lin
- Department of Urology, University of Washington, Seattle, WA, USA
| | - Yingye Zheng
- Department of Biostatistics, Fred Hutchinson Cancer Center, Seattle, WA, USA
| |
Collapse
|
2
|
Wang S, Shi S, Qin G. Interval estimation for the Youden index of a continuous diagnostic test with verification biased data. Stat Methods Med Res 2025:9622802251322989. [PMID: 40111816 DOI: 10.1177/09622802251322989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
In medical diagnostic studies, the Youden index plays a crucial role as a comprehensive measurement of the diagnostic test effectiveness, aiding in determining the optimal threshold values by maximizing the sum of sensitivity and specificity. However, in clinical practice, verification of true disease status might be partially missing and estimators based on partially validated subjects are usually biased. While verification bias-corrected estimation methods for the receiver operating characteristic curve have been widely studied, no such results have been specifically developed for the Youden index. In this paper, we propose bias-corrected interval estimation methods for the Youden index of a continuous test under the missing-at-random assumption. Based on four estimators (full imputation (FI), mean score imputation, inverse probability weighting, and the semiparametric efficient (SPE)) introduced by Alonzo and Pepe for handling verification bias, we develop multiple confidence intervals for the Youden index by applying bootstrap resampling and the method of variance estimates recovery (MOVER). Extensive simulation and real data studies show that when the disease model is correctly specified, MOVER-FI intervals yield better coverage probability. We also observe a tradeoff between methods when the verification proportion is low: Bootstrap approaches achieve higher accuracy, while MOVER approaches deliver greater precision. Remarkably, bootstrap-SPE interval exhibit appealing doubly robustness to model misspecification and perform adequately across almost all scenarios considered. Based on our findings, we recommend using the bootstrap-SPE intervals when the true disease model is unknown, and the MOVERws-FI interval if the true disease model can be well approximated.
Collapse
Affiliation(s)
- Shirui Wang
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA
| | - Shuangfei Shi
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA
| | - Gengsheng Qin
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA
| |
Collapse
|
3
|
Shi S, Qin G. Direct estimation of volume under the ROC surface with verification bias. J Biopharm Stat 2024; 34:553-581. [PMID: 37470408 DOI: 10.1080/10543406.2023.2236202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 07/01/2023] [Indexed: 07/21/2023]
Abstract
In practice, the receiver operating characteristic (ROC) curve of a diagnostic test is widely used to show the performance of the test for discriminating two-class events. The area under the ROC curve (AUC) is proposed as an index for the assessment of the diagnostic accuracy of the test under consideration. Due to ethical and cost considerations associated with application of gold standard (GS) tests, only a subset of the patients initially tested have verified disease status. Statistical evaluation of the test performance based only on test results from subjects with verified disease status are typically biased. Various AUC estimation methods for tests with verification biased data have been developed over the last few decades. In this article, we develop new direct estimation methods for the volume under the ROC surface (VUS) by extending the AUC estimation methods for two-class diagnostic tests to three-class diagnostic tests in the presence of verification bias. The proposed methods will provide a comprehensive guide to deal with the verification bias in three-class diagnostic test accuracy studies and lead to a better choice of diagnostic tests.
Collapse
Affiliation(s)
- Shuangfei Shi
- Department of Mathematics and Statistics, Georgia State University, Atlanta, USA
| | - Gengsheng Qin
- Department of Mathematics and Statistics, Georgia State University, Atlanta, USA
| |
Collapse
|
4
|
Lange J, Zhao Y, Gogebakan KC, Olivas-Martinez A, Ryser MD, Gard CC, Etzioni R. Test sensitivity in a prospective cancer screening program: A critique of a common proxy measure. Stat Methods Med Res 2023; 32:1053-1063. [PMID: 37287266 DOI: 10.1177/09622802221142529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The true sensitivity of a cancer screening test, defined as the frequency with which the test returns a positive result if the cancer is present, is a key indicator of diagnostic performance. Given the challenges of directly assessing test sensitivity in a prospective screening program, proxy measures for true sensitivity are frequently reported. We call one such proxy empirical sensitivity, as it is given by the observed ratio of screen-detected cancers to the sum of screen-detected and interval cancers. In the setting of the canonical three-state Markov model for progression from preclinical onset to clinical diagnosis, we formulate a mathematical relationship for how empirical sensitivity varies with the screening interval and the mean preclinical sojourn time and identify conditions under which empirical sensitivity exceeds or falls short of true sensitivity. In particular, when the inter-screening interval is short relative to the mean sojourn time, empirical sensitivity tends to exceed true sensitivity, unless true sensitivity is high. The Breast Cancer Surveillance Consortium (BCSC) has reported an estimate of 0.87 for the empirical sensitivity of digital mammography. We show that this corresponds to a true sensitivity of 0.82 under a mean sojourn time of 3.6 years estimated based on breast cancer screening trials. However, the BCSC estimate of empirical sensitivity corresponds to even lower true sensitivity under more contemporary, longer estimates of mean sojourn time. Consistently applied nomenclature that distinguishes empirical sensitivity from true sensitivity is needed to ensure that published estimates of sensitivity from prospective screening studies are properly interpreted.
Collapse
Affiliation(s)
- Jane Lange
- Oregon Health and Science University, Knight Cancer Institute, Portland, OR, USA
| | - Yibai Zhao
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - Antonio Olivas-Martinez
- Department of Biostatistics, University of Washington School of Public Health, Seattle, WA, USA
| | - Marc D Ryser
- Department of Population Health Sciences and Department of Mathematics, Duke University Durham, NC, USA
| | - Charlotte C Gard
- Department of Economics, Applied Statistics and International Business, New Mexico State University, Las Cruces, NM, USA
| | - Ruth Etzioni
- Oregon Health and Science University, Knight Cancer Institute, Portland, OR, USA
- Department of Biostatistics, University of Washington School of Public Health, Seattle, WA, USA
- Department of Health Services, University of Washington School of Public Health, Seattle, WA, USA
| |
Collapse
|
5
|
Shao Y, Todd K, Shutes-David A, Millard SP, Brown K, Thomas A, Chen K, Wilson K, Zeng QT, Tsuang DW. Identifying probable dementia in undiagnosed Black and White Americans using machine learning in Veterans Health Administration electronic health records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.08.23285540. [PMID: 36798376 PMCID: PMC9934793 DOI: 10.1101/2023.02.08.23285540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
The application of machine learning (ML) tools in electronic health records (EHRs) can help reduce the underdiagnosis of dementia, but models that are not designed to reflect minority population may perpetuate that underdiagnosis. To address the underdiagnosis of dementia in both Black Americans (BAs) and white Americans (WAs), we sought to develop and validate ML models that assign race-specific risk scores. These scores were used to identify undiagnosed dementia in BA and WA Veterans in EHRs. More specifically, risk scores were generated separately for BAs (n=10K) and WAs (n=10K) in training samples of cases and controls by performing ML, equivalence mapping, topic modeling, and a support vector-machine (SVM) in structured and unstructured EHR data. Scores were validated via blinded manual chart reviews (n=1.2K) of controls from a separate sample (n=20K). AUCs and negative and positive predictive values (NPVs and PPVs) were calculated to evaluate the models. There was a strong positive relationship between SVM-generated risk scores and undiagnosed dementia. BAs were more likely than WAs to have undiagnosed dementia per chart review, both overall (15.3% vs 9.5%) and among Veterans with >90th percentile cutoff scores (25.6% vs 15.3%). With chart reviews as the reference standard and varied cutoff scores, the BA model performed slightly better than the WA model (AUC=0.86 with NPV=0.98 and PPV=0.26 at >90th percentile cutoff vs AUC=0.77 with NPV=0.98 and PPV=0.15 at >90th). The AUCs, NPVs, and PPVs suggest that race-specific ML models can assist in the identification of undiagnosed dementia, particularly in BAs. Future studies should investigate implementing EHR-based risk scores in clinics that serve both BA and WA Veterans.
Collapse
Affiliation(s)
- Yijun Shao
- Washington DC VA Medical Center, Washington, DC, United States
- George Washington University, Science and Engineering Hall, Washington, DC, United States
| | - Kaitlin Todd
- Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, United States
| | - Andrew Shutes-David
- Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, United States
- Mental Illness Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, United States
| | - Steven P. Millard
- Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, United States
| | - Karl Brown
- Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, United States
| | - Amy Thomas
- Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, United States
- Department of Medicine, University of Washington, Seattle, WA, United States
| | - Kathryn Chen
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, United States
| | - Katherine Wilson
- Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, United States
- Department of Biostatistics, University of Washington, Seattle, WA, United States
| | - Qing T. Zeng
- Washington DC VA Medical Center, Washington, DC, United States
- George Washington University, Science and Engineering Hall, Washington, DC, United States
| | - Debby W. Tsuang
- Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, United States
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, United States
| |
Collapse
|
6
|
Arifin WN, Yusof UK. Partial Verification Bias Correction Using Inverse Probability Bootstrap Sampling for Binary Diagnostic Tests. Diagnostics (Basel) 2022; 12:diagnostics12112839. [PMID: 36428900 PMCID: PMC9689704 DOI: 10.3390/diagnostics12112839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/03/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
In medical care, it is important to evaluate any new diagnostic test in the form of diagnostic accuracy studies. These new tests are compared to gold standard tests, where the performance of binary diagnostic tests is usually measured by sensitivity (Sn) and specificity (Sp). However, these accuracy measures are often biased owing to selective verification of the patients, known as partial verification bias (PVB). Inverse probability bootstrap (IPB) sampling is a general method to correct sampling bias in model-based analysis and produces debiased data for analysis. However, its utility in PVB correction has not been investigated before. The objective of this study was to investigate IPB in the context of PVB correction under the missing-at-random assumption for binary diagnostic tests. IPB was adapted for PVB correction, and tested and compared with existing methods using simulated and clinical data sets. The results indicated that IPB is accurate for Sn and Sp estimation as it showed low bias. However, IPB was less precise than existing methods as indicated by the higher standard error (SE). Despite this issue, it is recommended to use IPB when subsequent analysis with full data analytic methods is expected. Further studies must be conducted to reduce the SE.
Collapse
Affiliation(s)
- Wan Nor Arifin
- School of Computer Sciences, Universiti Sains Malaysia, Gelugor 11800, Pulau Pinang, Malaysia
- Biostatistics and Research Methodology Unit, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
- Correspondence: (W.N.A.); (U.K.Y.)
| | - Umi Kalsom Yusof
- School of Computer Sciences, Universiti Sains Malaysia, Gelugor 11800, Pulau Pinang, Malaysia
- Correspondence: (W.N.A.); (U.K.Y.)
| |
Collapse
|
7
|
Pfeiffer RM, Chen Y, Gail MH, Ankerst DP. Accommodating population differences when validating risk prediction models. Stat Med 2022; 41:4756-4780. [PMID: 36224712 PMCID: PMC10510530 DOI: 10.1002/sim.9447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/19/2022] [Accepted: 05/11/2022] [Indexed: 11/11/2022]
Abstract
Validation of risk prediction models in independent data provides a more rigorous assessment of model performance than internal assessment, for example, done by cross-validation in the data used for model development. However, several differences between the populations that gave rise to the training and the validation data can lead to seemingly poor performance of a risk model. In this paper we formalize the notions of "similarity" or "relatedness" of the training and validation data, and define reproducibility and transportability. We address the impact of different distributions of model predictors and differences in verifying the disease status or outcome on measures of calibration, accuracy and discrimination of a model. When individual level information from both the training and validation data sets is available, we propose and study weighted versions of the validation metrics that adjust for differences in the risk factor distributions and in outcome verification between the training and validation data to provide a more comprehensive assessment of model performance. We provide conditions on the risk model and the populations that gave rise to the training and validation data that ensure a model's reproducibility or transportability, and show how to check these conditions using weighted and unweighted performance measures. We illustrate the method by developing and validating a model that predicts the risk of developing prostate cancer using data from two large prostate cancer screening trials.
Collapse
Affiliation(s)
| | - Yiyao Chen
- Technical University of Munich, Garching, Germany
| | | | | |
Collapse
|
8
|
Arifin WN, Yusof UK. Correcting for partial verification bias in diagnostic accuracy studies: A tutorial using R. Stat Med 2022; 41:1709-1727. [PMID: 35043447 DOI: 10.1002/sim.9311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 12/19/2021] [Accepted: 12/21/2021] [Indexed: 11/06/2022]
Abstract
Diagnostic tests play a crucial role in medical care. Thus any new diagnostic tests must undergo a thorough evaluation. New diagnostic tests are evaluated in comparison with the respective gold standard tests. The performance of binary diagnostic tests is quantified by accuracy measures, with sensitivity and specificity being the most important measures. In any diagnostic accuracy study, the estimates of these measures are often biased owing to selective verification of the patients, which is referred to as partial verification bias. Several methods for correcting partial verification bias are available depending on the scale of the index test, target outcome, and missing data mechanism. However, these are not easily accessible to the researchers due to the complexity of the methods. This article aims to provide a brief overview of the methods available to correct for partial verification bias involving a binary diagnostic test and provide a practical tutorial on how to implement the methods using the statistical programming language R.
Collapse
Affiliation(s)
- Wan Nor Arifin
- School of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang, Malaysia.,Biostatistics and Research Methodology Unit, School of Medical Sciences, Universiti Sains Malaysia, Kelantan, Malaysia
| | - Umi Kalsom Yusof
- School of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang, Malaysia
| |
Collapse
|
9
|
Abstract
Nonparametric inference of the area under ROC curve (AUC) has been well developed either in the presence of verification bias or clustering. However, current nonparametric methods are not able to handle cases where both verification bias and clustering are present. Such a case arises when a two-phase study design is applied to a cohort of subjects (verification bias) where each subject might have multiple test results (clustering). In such cases, the inference of AUC must account for both verification bias and intra-cluster correlation. In the present paper, we propose an IPW AUC estimator that corrects for verification bias and derive a variance formula to account for intra-cluster correlations between disease status and test results. Results of a simulation study indicate that the method that assumes independence underestimates the true variance of the IPW AUC estimator in the presence of intra-cluster correlations. The proposed method, on the other hand, provides a consistent variance estimate for the IPW AUC estimator by appropriately accounting for correlations between true disease statuses and between test results.
Collapse
Affiliation(s)
- Yougui Wu
- Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, Florida, USA
| |
Collapse
|
10
|
Lin R, Chan KG, Shi H. A unified Bayesian framework for exact inference of area under the receiver operating characteristic curve. Stat Methods Med Res 2021; 30:2269-2287. [PMID: 34468238 DOI: 10.1177/09622802211037070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The area under the receiver operating characteristic curve is a widely used measure for evaluating the performance of a diagnostic test. Common approaches for inference on area under the receiver operating characteristic curve are usually based upon approximation. For example, the normal approximation based inference tends to suffer from the problem of low accuracy for small sample size. Frequentist empirical likelihood based approaches for area under the receiver operating characteristic curve estimation may perform better, but are usually conducted through approximation in order to reduce the computational burden, thus the inference is not exact. By contrast, we proposed an exact inferential procedure by adapting the empirical likelihood into a Bayesian framework and draw inference from the posterior samples of the area under the receiver operating characteristic curve obtained via a Gibbs sampler. The full conditional distributions within the Gibbs sampler only involve empirical likelihoods with linear constraints, which greatly simplify the computation. To further enhance the applicability and flexibility of the Bayesian empirical likelihood, we extend our method to the estimation of partial area under the receiver operating characteristic curve, comparison of multiple tests, and the doubly robust estimation of area under the receiver operating characteristic curve in the presence of missing test results. Simulation studies confirm the desirable performance of the proposed methods, and a real application is presented to illustrate its usefulness.
Collapse
Affiliation(s)
- Ruitao Lin
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, USA
| | - Kc Gary Chan
- Department of Biostatistics, 7284University of Washington, USA
| | - Haolun Shi
- Department of Statistics and Actuarial Science, Simon Fraser University, Canada
| |
Collapse
|
11
|
Hai Y, Qin G. Direct estimation of the area under the receiver operating characteristic curve with verification biased data. Stat Med 2020; 39:4789-4820. [DOI: 10.1002/sim.8753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/14/2020] [Accepted: 08/23/2020] [Indexed: 11/11/2022]
Affiliation(s)
- Yan Hai
- Department of Mathematics and Statistics Georgia State University Atlanta Georgia USA
| | - Gengsheng Qin
- Department of Mathematics and Statistics Georgia State University Atlanta Georgia USA
| |
Collapse
|
12
|
Doubly robust kernel density estimation when group membership is missing at random. J Stat Plan Inference 2020. [DOI: 10.1016/j.jspi.2019.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
13
|
Umemneku Chikere CM, Wilson K, Graziadio S, Vale L, Allen AJ. Diagnostic test evaluation methodology: A systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard - An update. PLoS One 2019; 14:e0223832. [PMID: 31603953 PMCID: PMC6788703 DOI: 10.1371/journal.pone.0223832] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 09/29/2019] [Indexed: 12/29/2022] Open
Abstract
OBJECTIVE To systematically review methods developed and employed to evaluate the diagnostic accuracy of medical test when there is a missing or no gold standard. STUDY DESIGN AND SETTINGS Articles that proposed or applied any methods to evaluate the diagnostic accuracy of medical test(s) in the absence of gold standard were reviewed. The protocol for this review was registered in PROSPERO (CRD42018089349). RESULTS Identified methods were classified into four main groups: methods employed when there is a missing gold standard; correction methods (which make adjustment for an imperfect reference standard with known diagnostic accuracy measures); methods employed to evaluate a medical test using multiple imperfect reference standards; and other methods, like agreement studies, and a mixed group of alternative study designs. Fifty-one statistical methods were identified from the review that were developed to evaluate medical test(s) when the true disease status of some participants is unverified with the gold standard. Seven correction methods were identified and four methods were identified to evaluate medical test(s) using multiple imperfect reference standards. Flow-diagrams were developed to guide the selection of appropriate methods. CONCLUSION Various methods have been proposed to evaluate medical test(s) in the absence of a gold standard for some or all participants in a diagnostic accuracy study. These methods depend on the availability of the gold standard, its' application to the participants in the study and the availability of alternative reference standard(s). The clinical application of some of these methods, especially methods developed when there is missing gold standard is however limited. This may be due to the complexity of these methods and/or a disconnection between the fields of expertise of those who develop (e.g. mathematicians) and those who employ the methods (e.g. clinical researchers). This review aims to help close this gap with our classification and guidance tools.
Collapse
Affiliation(s)
- Chinyereugo M. Umemneku Chikere
- Institute of Health & Society, Faculty of Medical Sciences Newcastle University, Newcastle upon Tyne, England, United Kingdom
| | - Kevin Wilson
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, England, United Kingdom
| | - Sara Graziadio
- National Institute for Health Research, Newcastle In Vitro Diagnostics Co-operative, Newcastle upon Tyne Hospitals National Health Services Foundation Trust, Newcastle upon Tyne, England, United Kingdom
| | - Luke Vale
- Institute of Health & Society, Faculty of Medical Sciences Newcastle University, Newcastle upon Tyne, England, United Kingdom
| | - A. Joy Allen
- National Institute for Health Research, Newcastle In Vitro Diagnostics Co-operative, Newcastle University, Newcastle upon Tyne, England, United Kingdom
| |
Collapse
|
14
|
Zhu R, Ghosal S. Bayesian nonparametric estimation of ROC surface under verification bias. Stat Med 2019; 38:3361-3377. [PMID: 31049998 DOI: 10.1002/sim.8181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Revised: 02/19/2019] [Accepted: 04/06/2019] [Indexed: 12/11/2022]
Abstract
The receiver operating characteristic (ROC) surface, as a generalization of the ROC curve, has been widely used to assess the accuracy of a diagnostic test for three categories. A common problem is verification bias, referring to the situation where not all subjects have their true classes verified. In this paper, we consider the problem of estimating the ROC surface under verification bias. We adopt a Bayesian nonparametric approach by directly modeling the underlying distributions of the three categories by Dirichlet process mixture priors. We propose a robust computing algorithm by only imposing a missing at random assumption for the verification process but no assumption on the distributions. The method can also accommodate covariates information in estimating the ROC surface, which can lead to a more comprehensive understanding of the diagnostic accuracy. It can be adapted and hugely simplified in the case where there is no verification bias, and very fast computation is possible through the Bayesian bootstrap process. The proposed method is compared with other commonly used methods by extensive simulations. We find that the proposed method generally outperforms other approaches. Applying the method to two real datasets, the key findings are as follows: (1) human epididymis protein 4 has a slightly better diagnosis ability compared to CA125 in discriminating healthy, early stage, and late stage patients of epithelial ovarian cancer. (2) Serum albumin has a prognostic ability in distinguishing different stages of hepatocellular carcinoma.
Collapse
Affiliation(s)
- Rui Zhu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina
| | - Subhashis Ghosal
- Department of Statistics, North Carolina State University, Raleigh, North Carolina
| |
Collapse
|
15
|
Li H, Gatsonis C. Combining biomarker trajectories to improve diagnostic accuracy in prospective cohort studies with verification bias. Stat Med 2019; 38:1968-1990. [PMID: 30590870 DOI: 10.1002/sim.8079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 09/20/2018] [Accepted: 12/04/2018] [Indexed: 11/10/2022]
Abstract
In this paper, we develop methods to combine multiple biomarker trajectories into a composite diagnostic marker using functional data analysis (FDA) to achieve better diagnostic accuracy in monitoring disease recurrence in the setting of a prospective cohort study. In such studies, the disease status is usually verified only for patients with a positive test result in any biomarker and is missing in patients with negative test results in all biomarkers. Thus, the test result will affect disease verification, which leads to verification bias if the analysis is restricted only to the verified cases. We treat verification bias as a missing data problem. Under both missing at random (MAR) and missing not at random (MNAR) assumptions, we derive the optimal classification rules using the Neyman-Pearson lemma based on the composite diagnostic marker. We estimate thresholds adjusted for verification bias to dichotomize patients as test positive or test negative, and we evaluate the diagnostic accuracy using the verification bias corrected area under the ROC curves (AUCs). We evaluate the performance and robustness of the FDA combination approach and assess the consistency of the approach through simulation studies. In addition, we perform a sensitivity analysis of the dependency between the verification process and disease status for the approach under the MNAR assumption. We apply the proposed method on data from the Religious Orders Study and from a non-small cell lung cancer trial.
Collapse
Affiliation(s)
- Hong Li
- Department of Public Health Science, Medical University of South Carolina, Charleston, South Carolina
| | | |
Collapse
|
16
|
Zhu R, Ghosal S. Bayesian Semiparametric ROC surface estimation under verification bias. Comput Stat Data Anal 2019. [DOI: 10.1016/j.csda.2018.09.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
17
|
Cho H, Matthews GJ, Harel O. Confidence Intervals for the Area Under the Receiver Operating Characteristic Curve in the Presence of Ignorable Missing Data. Int Stat Rev 2019; 87:152-177. [PMID: 31007356 PMCID: PMC6472951 DOI: 10.1111/insr.12277] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 05/31/2018] [Indexed: 11/30/2022]
Abstract
Receiver operating characteristic curves are widely used as a measure of accuracy of diagnostic tests and can be summarised using the area under the receiver operating characteristic curve (AUC). Often, it is useful to construct a confidence interval for the AUC; however, because there are a number of different proposed methods to measure variance of the AUC, there are thus many different resulting methods for constructing these intervals. In this article, we compare different methods of constructing Wald-type confidence interval in the presence of missing data where the missingness mechanism is ignorable. We find that constructing confidence intervals using multiple imputation based on logistic regression gives the most robust coverage probability and the choice of confidence interval method is less important. However, when missingness rate is less severe (e.g. less than 70%), we recommend using Newcombe's Wald method for constructing confidence intervals along with multiple imputation using predictive mean matching.
Collapse
Affiliation(s)
- Hunyong Cho
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
| | - Gregory J Matthews
- Department of Mathematics and Statistics, Loyola University Chicago, 1032 W. Sheridan Road, Chicago, IL 60660, USA
| | - Ofer Harel
- Department of Statistics, University of Connecticut, 215 Glenbrook Road U-4120, Storrs, CT 06269, USA
| |
Collapse
|
18
|
Zhang Y, Alonzo TA. Estimation of the volume under the receiver-operating characteristic surface adjusting for non-ignorable verification bias. Stat Methods Med Res 2018; 27:715-739. [PMID: 29338546 DOI: 10.1177/0962280217742541] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The receiver-operating characteristic surface is frequently used for presenting the accuracy of a diagnostic test for three-category classification problems. One common problem that can complicate the estimation of the volume under receiver-operating characteristic surface is that not all subjects receive the verification of the true disease status. Estimation based only on data from subjects with verified disease status may be biased, which is referred to as verification bias. In this article, we propose new verification bias correction methods to estimate the volume under receiver-operating characteristic surface for a continuous diagnostic test. We assume the verification process is missing not at random, which means the missingness might be related to unobserved clinical characteristics. Three classes of estimators are proposed, namely, inverse probability weighted, imputation-based, and doubly robust estimators. A jackknife estimator of variance is derived for all the proposed volume under receiver-operating characteristic surface estimators. The finite sample properties of the new estimators are examined via simulation studies. We illustrate our methods with data collected from Alzheimer's disease research.
Collapse
Affiliation(s)
- Ying Zhang
- Department of Biostatistics, University of Southern California, Keck School of Medicine, Los Angeles, CA, USA
| | - Todd A Alonzo
- Department of Biostatistics, University of Southern California, Keck School of Medicine, Los Angeles, CA, USA
| | -
- Department of Biostatistics, University of Southern California, Keck School of Medicine, Los Angeles, CA, USA
| |
Collapse
|
19
|
Asano J, Hirakawa A. Assessing the prediction accuracy of a cure model for censored survival data with long-term survivors: Application to breast cancer data. J Biopharm Stat 2017; 27:918-932. [PMID: 28324665 DOI: 10.1080/10543406.2017.1293082] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The Cox proportional hazards cure model is a survival model incorporating a cure rate with the assumption that the population contains both uncured and cured individuals. It contains a logistic regression for the cure rate, and a Cox regression to estimate the hazard for uncured patients. A single predictive model for both the cure and hazard can be developed by using a cure model that simultaneously predicts the cure rate and hazards for uncured patients; however, model selection is a challenge because of the lack of a measure for quantifying the predictive accuracy of a cure model. Recently, we developed an area under the receiver operating characteristic curve (AUC) for determining the cure rate in a cure model (Asano et al., 2014), but the hazards measure for uncured patients was not resolved. In this article, we propose novel C-statistics that are weighted by the patients' cure status (i.e., cured, uncured, or censored cases) for the cure model. The operating characteristics of the proposed C-statistics and their confidence interval were examined by simulation analyses. We also illustrate methods for predictive model selection and for further interpretation of variables using the proposed AUCs and C-statistics via application to breast cancer data.
Collapse
Affiliation(s)
- Junichi Asano
- a Biostatistics Group, Center for Product Evaluation , Pharmaceuticals and Medical Devices Agency , Tokyo , Japan
| | - Akihiro Hirakawa
- b Biostatistics Section, Center for Advanced Medicine and Clinical Research , Nagoya University Graduate School of Medicine , Nagoya , Japan
| |
Collapse
|
20
|
He H, Wang W, Tang W. Prediction model-based kernel density estimation when group membership is subject to missing. ASTA-ADVANCES IN STATISTICAL ANALYSIS 2016; 101:267-288. [PMID: 28947920 DOI: 10.1007/s10182-016-0283-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The density function is a fundamental concept in data analysis. When a population consists of heterogeneous subjects, it's often of great interest to estimate the density functions of the subpopulations. Nonparametric methods such as kernel smoothing estimates may be applied to each subpopulation to estimate the density functions if there are no missing values. In situations where the membership for a subpopulation is missing, kernel smoothing estimates using only subjects with membership available are valid only under missing complete at random (MCAR). In this paper, we propose new kernel smoothing methods for density function estimates by applying prediction models of the membership under the missing at random (MAR) assumption. The asymptotic properties of the new estimates are developed, and simulation studies and a real study in mental health are used to illustrate the performance of the new estimates.
Collapse
Affiliation(s)
- Hua He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine
| | | | - Wan Tang
- Department of Biostatistics & Bioinformatics, Tulane University School of Public Health and Tropical Medicine
| |
Collapse
|
21
|
Zhang Y, Alonzo TA. Inverse probability weighting estimation of the volume under the ROC surface in the presence of verification bias. Biom J 2016; 58:1338-1356. [PMID: 27338713 DOI: 10.1002/bimj.201500225] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 02/28/2016] [Accepted: 03/10/2016] [Indexed: 11/08/2022]
Abstract
In diagnostic medicine, the volume under the receiver operating characteristic (ROC) surface (VUS) is a commonly used index to quantify the ability of a continuous diagnostic test to discriminate between three disease states. In practice, verification of the true disease status may be performed only for a subset of subjects under study since the verification procedure is invasive, risky, or expensive. The selection for disease examination might depend on the results of the diagnostic test and other clinical characteristics of the patients, which in turn can cause bias in estimates of the VUS. This bias is referred to as verification bias. Existing verification bias correction in three-way ROC analysis focuses on ordinal tests. We propose verification bias-correction methods to construct ROC surface and estimate the VUS for a continuous diagnostic test, based on inverse probability weighting. By applying U-statistics theory, we develop asymptotic properties for the estimator. A Jackknife estimator of variance is also derived. Extensive simulation studies are performed to evaluate the performance of the new estimators in terms of bias correction and variance. The proposed methods are used to assess the ability of a biomarker to accurately identify stages of Alzheimer's disease.
Collapse
Affiliation(s)
- Ying Zhang
- Department of Biostatistics, University of Southern California, Keck School of Medicine, Los Angeles, California 90033, USA.
| | - Todd A Alonzo
- Department of Biostatistics, University of Southern California, Keck School of Medicine, Los Angeles, California 90033, USA
| | | |
Collapse
|
22
|
Abstract
For a continuous-scale diagnostic test, the receiver operating characteristic (ROC) curve is a popular tool for displaying the ability of the test to discriminate between healthy and diseased subjects. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the test result and other characteristics of the subjects. Estimators of the ROC curve based only on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias, in particular under the assumption that the true disease status, if missing, is missing at random (MAR). MAR assumption means that the probability of missingness depends on the true disease status only through the test result and observed covariate information. However, the existing methods require parametric models for the (conditional) probability of disease and/or the (conditional) probability of verification, and hence are subject to model misspecification: a wrong specification of such parametric models can affect the behavior of the estimators, which can be inconsistent. To avoid misspecification problems, in this paper we propose a fully nonparametric method for the estimation of the ROC curve of a continuous test under verification bias. The method is based on nearest-neighbor imputation and adopts generic smooth regression models for both the probability that a subject is diseased and the probability that it is verified. Simulation experiments and an illustrative example show the usefulness of the new method. Variance estimation is also discussed.
Collapse
|
23
|
Validity and diagnostic accuracy of the Luganda version of the 9-item and 2-item Patient Health Questionnaire for detecting major depressive disorder in rural Uganda. Glob Ment Health (Camb) 2016; 3:e20. [PMID: 28596888 PMCID: PMC5314749 DOI: 10.1017/gmh.2016.14] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Revised: 04/26/2016] [Accepted: 05/18/2016] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The prevalence of depression in rural Ugandan communities is high and yet detection and treatment of depression in the primary care setting is suboptimal. Short valid depression screening measures may improve detection of depression. We describe the validation of the Luganda translated nine- and two-item Patient Health Questionnaires (PHQ-9 and PHQ-2) as screening tools for depression in two rural primary care facilities in Eastern Uganda. METHODS A total of 1407 adult respondents were screened consecutively using the nine-item Luganda PHQ. Of these 212 were randomly selected to respond to the Mini International Neuropsychiatric Interview diagnostic questionnaire. Descriptive statistics for respondents' demographic characteristics and PHQ scores were generated. The sensitivity, specificity and positive predictive values (PPVs), and area under the ROC curve were determined for both the PHQ-9 and PHQ-2. RESULTS The optimum trade-off between sensitivity and PPV was at a cut-off of ≧5. The weighted area under the receiver Operating Characteristic curve was 0.74 (95% CI 0.60-0.89) and 0.68 (95% CI 0.54-0.82) for PHQ-9 and PHQ-2, respectively. CONCLUSION The Luganda translation of the PHQ-9 was found to be modestly useful in detecting depression. The PHQ-9 performed only slightly better than the PHQ-2 in this rural Ugandan Primary care setting. Future research could improve on diagnostic accuracy by considering the idioms of distress among Luganda speakers, and revising the PHQ-9 accordingly. The usefulness of the PHQ-2 in this rural population should be viewed with caution.
Collapse
|
24
|
Li S, Ning Y. Estimation of covariate-specific time-dependent ROC curves in the presence of missing biomarkers. Biometrics 2015; 71:666-76. [DOI: 10.1111/biom.12312] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Revised: 01/01/2015] [Accepted: 03/01/2015] [Indexed: 01/21/2023]
Affiliation(s)
- Shanshan Li
- Department of Biostatistics; Indiana University School of Public Health; Indianapolis, Indiana 46202 U.S.A
| | - Yang Ning
- Department of Statistics and Actuarial Science; University of Waterloo; Waterloo, Ontario, Canada N2L 3G1
| |
Collapse
|
25
|
Gu J, Ghosal S, Kleiner DE. Bayesian ROC curve estimation under verification bias. Stat Med 2014; 33:5081-96. [PMID: 25269427 DOI: 10.1002/sim.6297] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Revised: 08/04/2014] [Accepted: 08/17/2014] [Indexed: 02/04/2023]
Abstract
Receiver operating characteristic (ROC) curve has been widely used in medical science for its ability to measure the accuracy of diagnostic tests under the gold standard. However, in a complicated medical practice, a gold standard test can be invasive, expensive, and its result may not always be available for all the subjects under study. Thus, a gold standard test is implemented only when it is necessary and possible. This leads to the so-called 'verification bias', meaning that subjects with verified disease status (also called label) are not selected in a completely random fashion. In this paper, we propose a new Bayesian approach for estimating an ROC curve based on continuous data following the popular semiparametric binormal model in the presence of verification bias. By using a rank-based likelihood, and following Gibbs sampling techniques, we compute the posterior distribution of the binormal parameters intercept and slope, as well as the area under the curve by imputing the missing labels within Markov Chain Monte-Carlo iterations. Consistency of the resulting posterior under mild conditions is also established. We compare the new method with other comparable methods and conclude that our estimator performs well in terms of accuracy.
Collapse
Affiliation(s)
- Jiezhun Gu
- Duke Clinical Research Institute, Duke University Medical Center, PO Box 17969, Durham, NC 27715, U.S.A
| | | | | |
Collapse
|
26
|
Wang B, Qin G. Empirical Likelihood-Based Confidence Intervals for the Sensitivity of a Continuous-Scale Diagnostic Test with Missing Data. COMMUN STAT-THEOR M 2014. [DOI: 10.1080/03610926.2012.695849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
27
|
Asano J, Hirakawa A, Hamada C. Assessing the prediction accuracy of cure in the Cox proportional hazards cure model: an application to breast cancer data. Pharm Stat 2014; 13:357-63. [PMID: 25044997 DOI: 10.1002/pst.1630] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Revised: 05/31/2014] [Accepted: 06/17/2014] [Indexed: 01/03/2023]
Abstract
A cure rate model is a survival model incorporating the cure rate with the assumption that the population contains both uncured and cured individuals. It is a powerful statistical tool for prognostic studies, especially in cancer. The cure rate is important for making treatment decisions in clinical practice. The proportional hazards (PH) cure model can predict the cure rate for each patient. This contains a logistic regression component for the cure rate and a Cox regression component to estimate the hazard for uncured patients. A measure for quantifying the predictive accuracy of the cure rate estimated by the Cox PH cure model is required, as there has been a lack of previous research in this area. We used the Cox PH cure model for the breast cancer data; however, the area under the receiver operating characteristic curve (AUC) could not be estimated because many patients were censored. In this study, we used imputation-based AUCs to assess the predictive accuracy of the cure rate from the PH cure model. We examined the precision of these AUCs using simulation studies. The results demonstrated that the imputation-based AUCs were estimable and their biases were negligibly small in many cases, although ordinary AUC could not be estimated. Additionally, we introduced the bias-correction method of imputation-based AUCs and found that the bias-corrected estimate successfully compensated the overestimation in the simulation studies. We also illustrated the estimation of the imputation-based AUCs using breast cancer data.
Collapse
Affiliation(s)
- Junichi Asano
- Biostatistics Group, Center for Product Evaluation, Pharmaceuticals and Medical Devices Agency, Tokyo, 100-0013, Japan
| | | | | |
Collapse
|
28
|
Collins J, Huynh M. Estimation of diagnostic test accuracy without full verification: a review of latent class methods. Stat Med 2014; 33:4141-69. [PMID: 24910172 DOI: 10.1002/sim.6218] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Revised: 05/02/2014] [Accepted: 05/05/2014] [Indexed: 11/09/2022]
Abstract
The performance of a diagnostic test is best evaluated against a reference test that is without error. For many diseases, this is not possible, and an imperfect reference test must be used. However, diagnostic accuracy estimates may be biased if inaccurately verified status is used as the truth. Statistical models have been developed to handle this situation by treating disease as a latent variable. In this paper, we conduct a systematized review of statistical methods using latent class models for estimating test accuracy and disease prevalence in the absence of complete verification.
Collapse
Affiliation(s)
- John Collins
- Rehabilitation Medicine Department, National Institutes of Health, Bethesda MD 20892, U.S.A
| | | |
Collapse
|
29
|
Douglas PS, Hoffmann U, Lee KL, Mark DB, Al-Khalidi HR, Anstrom K, Dolor RJ, Kosinski A, Krucoff MW, Mudrick DW, Patel MR, Picard MH, Udelson JE, Velazquez EJ, Cooper L. PROspective Multicenter Imaging Study for Evaluation of chest pain: rationale and design of the PROMISE trial. Am Heart J 2014; 167:796-803.e1. [PMID: 24890527 PMCID: PMC4044617 DOI: 10.1016/j.ahj.2014.03.003] [Citation(s) in RCA: 102] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 03/05/2014] [Indexed: 12/24/2022]
Abstract
BACKGROUND Suspected coronary artery disease (CAD) is one of the most common, potentially life-threatening diagnostic problems clinicians encounter. However, no large outcome-based randomized trials have been performed to guide the selection of diagnostic strategies for these patients. METHODS The PROMISE study is a prospective, randomized trial comparing the effectiveness of 2 initial diagnostic strategies in patients with symptoms suspicious for CAD. Patients are randomized to either (1) functional testing (exercise electrocardiogram, stress nuclear imaging, or stress echocardiogram) or (2) anatomical testing with ≥64-slice multidetector coronary computed tomographic angiography. Tests are interpreted locally in real time by subspecialty certified physicians, and all subsequent care decisions are made by the clinical care team. Sites are provided results of central core laboratory quality and completeness assessment. All subjects are followed up for ≥1 year. The primary end point is the time to occurrence of the composite of death, myocardial infarction, major procedural complications (stroke, major bleeding, anaphylaxis, and renal failure), or hospitalization for unstable angina. RESULTS More than 10,000 symptomatic subjects were randomized in 3.2 years at 193 US and Canadian cardiology, radiology, primary care, urgent care, and anesthesiology sites. CONCLUSION Multispecialty community practice enrollment into a large pragmatic trial of diagnostic testing strategies is both feasible and efficient. The PROMISE trial will compare the clinical effectiveness of an initial strategy of functional testing against an initial strategy of anatomical testing in symptomatic patients with suspected CAD. Quality of life, resource use, cost-effectiveness, and radiation exposure will be assessed.
Collapse
Affiliation(s)
- Pamela S Douglas
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH.
| | - Udo Hoffmann
- Massachusetts General Hospital, Harvard Medical School, Columbus, OH
| | - Kerry L Lee
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Daniel B Mark
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Hussein R Al-Khalidi
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Kevin Anstrom
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Rowena J Dolor
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Andrzej Kosinski
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Mitchell W Krucoff
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Daniel W Mudrick
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH; McConnell Heart Health Center, Columbus, OH
| | - Manesh R Patel
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Michael H Picard
- Massachusetts General Hospital, Harvard Medical School, Columbus, OH
| | - James E Udelson
- Tufts Medical Center, Tufts University School of Medicine, Boston, MA
| | - Eric J Velazquez
- Duke Clinical Research Institute, Duke University School of Medicine, Columbus, OH
| | - Lawton Cooper
- National Heart, Lung, and Blood Institute, Bethesda, MD
| |
Collapse
|
30
|
Ringham BM, Alonzo TA, Brinton JT, Kreidler SM, Munjal A, Muller KE, Glueck DH. Reducing decision errors in the paired comparison of the diagnostic accuracy of screening tests with Gaussian outcomes. BMC Med Res Methodol 2014; 14:37. [PMID: 24597517 PMCID: PMC4015908 DOI: 10.1186/1471-2288-14-37] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 02/26/2014] [Indexed: 11/12/2022] Open
Abstract
Background Scientists often use a paired comparison of the areas under the receiver operating characteristic curves to decide which continuous cancer screening test has the best diagnostic accuracy. In the paired design, all participants are screened with both tests. Participants with suspicious results or signs and symptoms of disease receive the reference standard test. The remaining participants are classified as non-cases, even though some may have occult disease. The standard analysis includes all study participants, which can create bias in the estimates of diagnostic accuracy since not all participants receive disease status verification. We propose a weighted maximum likelihood bias correction method to reduce decision errors. Methods Using Monte Carlo simulations, we assessed the method’s ability to reduce decision errors across a range of disease prevalences, correlations between screening test scores, rates of interval cases and proportions of participants who received the reference standard test. Results The performance of the method depends on characteristics of the screening tests and the disease and on the percentage of participants who receive the reference standard test. In studies with a large amount of bias in the difference in the full areas under the curves, the bias correction method reduces the Type I error rate and improves power for the correct decision. We demonstrate the method with an application to a hypothetical oral cancer screening study. Conclusion The bias correction method reduces decision errors for some paired screening trials. In order to determine if bias correction is needed for a specific screening trial, we recommend the investigator conduct a simulation study using our software.
Collapse
Affiliation(s)
- Brandy M Ringham
- Center for Cancer Prevention and Control Research, University of California, Los Angeles, 650 Charles Young Drive South, Room A2-125 CHS, Los Angeles CA 90095, USA.
| | | | | | | | | | | | | |
Collapse
|
31
|
Albert PS, Liu A, Nansel T. Efficient logistic regression designs under an imperfect population identifier. Biometrics 2013; 70:175-84. [PMID: 24261471 DOI: 10.1111/biom.12106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Revised: 07/01/2013] [Accepted: 08/01/2013] [Indexed: 11/28/2022]
Abstract
Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial.
Collapse
Affiliation(s)
- Paul S Albert
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, 6100 Executive Blvd. Room 7B05, Bethesda, Maryland, 20892, U.S.A
| | | | | |
Collapse
|
32
|
Liu D, Zhou XH. Covariate adjustment in estimating the area under ROC curve with partially missing gold standard. Biometrics 2013; 69:91-100. [PMID: 23410529 PMCID: PMC3622116 DOI: 10.1111/biom.12001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In ROC analysis, covariate adjustment is advocated when the covariates impact the magnitude or accuracy of the test under study. Meanwhile, for many large scale screening tests, the true condition status may be subject to missingness because it is expensive and/or invasive to ascertain the disease status. The complete-case analysis may end up with a biased inference, also known as "verification bias." To address the issue of covariate adjustment with verification bias in ROC analysis, we propose several estimators for the area under the covariate-specific and covariate-adjusted ROC curves (AUCx and AAUC). The AUCx is directly modeled in the form of binary regression, and the estimating equations are based on the U statistics. The AAUC is estimated from the weighted average of AUCx over the covariate distribution of the diseased subjects. We employ reweighting and imputation techniques to overcome the verification bias problem. Our proposed estimators are initially derived assuming that the true disease status is missing at random (MAR), and then with some modification, the estimators can be extended to the not missing at random (NMAR) situation. The asymptotic distributions are derived for the proposed estimators. The finite sample performance is evaluated by a series of simulation studies. Our method is applied to a data set in Alzheimer's disease research.
Collapse
Affiliation(s)
- Danping Liu
- National Alzheimer's Coordinating Center, University of Washington, Seattle, Washington 98195, USA.
| | | |
Collapse
|
33
|
Wang X, Ma J, George S, Zhou H. Estimation of AUC or Partial AUC under Test-Result-Dependent Sampling. Stat Biopharm Res 2012; 4:313-323. [PMID: 23393612 DOI: 10.1080/19466315.2012.692514] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The area under the ROC curve (AUC) and partial area under the ROC curve (pAUC) are summary measures used to assess the accuracy of a biomarker in discriminating true disease status. The standard sampling approach used in biomarker validation studies is often inefficient and costly, especially when ascertaining the true disease status is costly and invasive. To improve efficiency and reduce the cost of biomarker validation studies, we consider a test-result-dependent sampling (TDS) scheme, in which subject selection for determining the disease state is dependent on the result of a biomarker assay. We first estimate the test-result distribution using data arising from the TDS design. With the estimated empirical test-result distribution, we propose consistent nonparametric estimators for AUC and pAUC and establish the asymptotic properties of the proposed estimators. Simulation studies show that the proposed estimators have good finite sample properties and that the TDS design yields more efficient AUC and pAUC estimates than a simple random sampling (SRS) design. A data example based on an ongoing cancer clinical trial is provided to illustrate the TDS design and the proposed estimators. This work can find broad applications in design and analysis of biomarker validation studies.
Collapse
Affiliation(s)
- Xiaofei Wang
- Department of Biostatistics & Bioinformatics, Duke University Medical Center, DUMC 2717, Durham, N.C. 27710, U.S.A
| | | | | | | |
Collapse
|
34
|
Abstract
The receiver operating characteristic (ROC) curve is often used to evaluate the performance of a biomarker measured on continuous scale to predict the disease status or a clinical condition. Motivated by the need for novel study designs with better estimation efficiency and reduced study cost, we consider a biased sampling scheme that consists of a SRC and a supplemental TDC. Using this approach, investigators can oversample or undersample subjects falling into certain regions of the biomarker measure, yielding improved precision for the estimation of the ROC curve with a fixed sample size. Test-result-dependent sampling will introduce bias in estimating the predictive accuracy of the biomarker if standard ROC estimation methods are used. In this article, we discuss three approaches for analyzing data of a test-result-dependent structure with a special focus on the empirical likelihood method. We establish asymptotic properties of the empirical likelihood estimators for covariate-specific ROC curves and covariate-independent ROC curves and give their corresponding variance estimators. Simulation studies show that the empirical likelihood method yields good properties and is more efficient than alternative methods. Recommendations on number of regions, cutoff points, and subject allocation is made based on the simulation results. The proposed methods are illustrated with a data example based on an ongoing lung cancer clinical trial.
Collapse
Affiliation(s)
- Xiaofei Wang
- Department of Biostatistics & Bioinformatics, Duke University Medical Center, Durham, NC 27710, USA.
| | | | | |
Collapse
|
35
|
Evaluating imaging and computer-aided detection and diagnosis devices at the FDA. Acad Radiol 2012; 19:463-77. [PMID: 22306064 DOI: 10.1016/j.acra.2011.12.016] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Revised: 12/22/2011] [Accepted: 12/28/2011] [Indexed: 11/22/2022]
Abstract
This report summarizes the Joint FDA-MIPS Workshop on Methods for the Evaluation of Imaging and Computer-Assist Devices. The purpose of the workshop was to gather information on the current state of the science and facilitate consensus development on statistical methods and study designs for the evaluation of imaging devices to support US Food and Drug Administration submissions. Additionally, participants expected to identify gaps in knowledge and unmet needs that should be addressed in future research. This summary is intended to document the topics that were discussed at the meeting and disseminate the lessons that have been learned through past studies of imaging and computer-aided detection and diagnosis device performance.
Collapse
|
36
|
Pinsky PF, Gallas B. Enriched designs for assessing discriminatory performance--analysis of bias and variance. Stat Med 2011; 31:501-15. [PMID: 22095795 DOI: 10.1002/sim.4432] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2010] [Revised: 09/01/2011] [Accepted: 09/16/2011] [Indexed: 11/10/2022]
Abstract
In evaluating discriminatory performance of a new modality in a screening setting, a logistical constraint is that the prevalence of the disease of interest is typically very low. This implies that under a standard study design large numbers of subjects have to be evaluated using the new modality. However, if a predicate modality exists in clinical practice, one can base inclusion into the study of the new modality on the clinical results from the predicate to 'enrich' the population of diseased subjects in the study. If this enrichment is not accounted for when estimating sensitivity, specificity, and area under the ROC curve, these 'naive' estimates may be substantially biased compared with expected performance in the intended use population. We derive expressions for the magnitude of this bias in terms of correlations of modality scores. When such estimates are 'corrected' for the sampling weights using inverse probability weighting, the variances of the estimates of the above quantities are affected. We derive here analytic expressions for these variances. For a fixed number of diseased subjects, differential sampling increases the variance of the (corrected) estimates, all other things being equal. However, differential sampling also increases the number with disease for fixed total study size, which decreases the variance of the sensitivity and area under the ROC curve estimates, all other things being equal. The balance of these two effects determines the gain in efficiency when using enrichment and corrected estimates. These principles are illustrated with a simulation study motivated by the Digital Mammographic Imaging Screening Trial study, a trial of digital versus screen film mammography.
Collapse
Affiliation(s)
- Paul F Pinsky
- Division of Imaging and Applied Mathematics, Office of Science and Engineering Laboratories, Center for Devices and Radiologic Health, Food and Drug Administration, Silver Spring, MD, USA.
| | | |
Collapse
|
37
|
He H, McDermott MP. A robust method using propensity score stratification for correcting verification bias for binary tests. Biostatistics 2011; 13:32-47. [PMID: 21856650 DOI: 10.1093/biostatistics/kxr020] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.
Collapse
Affiliation(s)
- Hua He
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642, USA
| | | |
Collapse
|
38
|
Liu D, Zhou XH. Semiparametric estimation of the covariate-specific ROC curve in presence of ignorable verification bias. Biometrics 2011; 67:906-16. [PMID: 21361890 DOI: 10.1111/j.1541-0420.2011.01562.x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Covariate-specific receiver operating characteristic (ROC) curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this article, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates' effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted, and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form of the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
Collapse
Affiliation(s)
- Danping Liu
- Department of Biostatistics, University of Washington, Seattle, Washington 98195, USA.
| | | |
Collapse
|
39
|
He H, Lyness JM, McDermott MP. Direct estimation of the area under the receiver operating characteristic curve in the presence of verification bias. Stat Med 2009; 28:361-76. [PMID: 18680124 DOI: 10.1002/sim.3388] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The area under a receiver operating characteristic (ROC) curve (AUC) is a commonly used index for summarizing the ability of a continuous diagnostic test to discriminate between healthy and diseased subjects. If all subjects have their true disease status verified, one can directly estimate the AUC nonparametrically using the Wilcoxon statistic. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Because estimators of the AUC based only on verified subjects are typically biased, it is common to estimate the AUC from a bias-corrected ROC curve. The variance of the estimator, however, does not have a closed-form expression and thus resampling techniques are used to obtain an estimate. In this paper, we develop a new method for directly estimating the AUC in the setting of verification bias based on U-statistics and inverse probability weighting (IPW). Closed-form expressions for the estimator and its variance are derived. We also show that the new estimator is equivalent to the empirical AUC derived from the bias-corrected ROC curve arising from the IPW approach.
Collapse
Affiliation(s)
- Hua He
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Ave, Box 630, Rochester, NY 14642, USA
| | | | | |
Collapse
|
40
|
Glueck DH, Lamb MM, O'Donnell CI, Ringham BM, Brinton JT, Muller KE, Lewin JM, Alonzo TA, Pisano ED. Bias in trials comparing paired continuous tests can cause researchers to choose the wrong screening modality. BMC Med Res Methodol 2009; 9:4. [PMID: 19154609 PMCID: PMC2657218 DOI: 10.1186/1471-2288-9-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2008] [Accepted: 01/20/2009] [Indexed: 11/10/2022] Open
Abstract
Background To compare the diagnostic accuracy of two continuous screening tests, a common approach is to test the difference between the areas under the receiver operating characteristic (ROC) curves. After study participants are screened with both screening tests, the disease status is determined as accurately as possible, either by an invasive, sensitive and specific secondary test, or by a less invasive, but less sensitive approach. For most participants, disease status is approximated through the less sensitive approach. The invasive test must be limited to the fraction of the participants whose results on either or both screening tests exceed a threshold of suspicion, or who develop signs and symptoms of the disease after the initial screening tests. The limitations of this study design lead to a bias in the ROC curves we call paired screening trial bias. This bias reflects the synergistic effects of inappropriate reference standard bias, differential verification bias, and partial verification bias. The absence of a gold reference standard leads to inappropriate reference standard bias. When different reference standards are used to ascertain disease status, it creates differential verification bias. When only suspicious screening test scores trigger a sensitive and specific secondary test, the result is a form of partial verification bias. Methods For paired screening tests with bivariate normally distributed scores, we give formulae and programs to quantify the effect of paired screening trial bias on a paired comparison of area under the curves. We fix the prevalence of disease, and the chance a diseased subject manifests signs and symptoms. We derive the formulas for true sensitivity and specificity, and those for the sensitivity and specificity observed by the study investigator. Results The observed area under the ROC curves is quite different from the true area under the ROC curves. The typical direction of the bias is a strong inflation in sensitivity, paired with a concomitant slight deflation of specificity. Conclusion In paired trials of screening tests, when area under the ROC curve is used as the metric, bias may lead researchers to make the wrong decision as to which screening test is better.
Collapse
Affiliation(s)
- Deborah H Glueck
- Department of Biostatistics, Colorado School of Public Health, University of Colorado, Denver, Aurora, CO, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|