1
|
Qiu SF, Lei J, Poon WY, Tang ML, Wong RS, Tao JR. Sample size determination for interval estimation of the prevalence of a sensitive attribute under non-randomized response models. Br J Math Stat Psychol 2024. [PMID: 38409814 DOI: 10.1111/bmsp.12338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/30/2024] [Accepted: 02/01/2024] [Indexed: 02/28/2024]
Abstract
A sufficient number of participants should be included to adequately address the research interest in the surveys with sensitive questions. In this paper, sample size formulas/iterative algorithms are developed from the perspective of controlling the confidence interval width of the prevalence of a sensitive attribute under four non-randomized response models: the crosswise model, parallel model, Poisson item count technique model and negative binomial item count technique model. In contrast to the conventional approach for sample size determination, our sample size formulas/algorithms explicitly incorporate an assurance probability of controlling the width of a confidence interval within the pre-specified range. The performance of the proposed methods is evaluated with respect to the empirical coverage probability, empirical assurance probability and confidence width. Simulation results show that all formulas/algorithms are effective and hence are recommended for practical applications. A real example is used to illustrate the proposed methods.
Collapse
Affiliation(s)
- Shi-Fang Qiu
- Department of Statistics, Chongqing University of Technology, Chongqing, China
| | - Jie Lei
- Department of Statistics, Chongqing University of Technology, Chongqing, China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| | - Man-Lai Tang
- Centre of Data Innovation Research, Department of Physics, Astronomy & Mathematics, School of Physics, Engineering & Computer Science, University of Hertfordshire, College Lane, Hatfield, UK
| | - Ricky S Wong
- Business School, University of Hertfordshire, Hatfield, UK
| | - Ji-Ran Tao
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
2
|
Qiu SF, Wang LM, Tang ML, Poon WY. Confidence interval construction for proportion difference from partially validated series with two fallible classifiers. J Biopharm Stat 2022; 32:871-896. [PMID: 35536693 DOI: 10.1080/10543406.2022.2058527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
This article investigates the confidence interval (CI) construction of proportion difference for two independent partially validated series under the double-sampling scheme in which both classifiers are fallible. Several CIs based on the variance estimates recovery method of combining confidence limits from asymptotic, bootstrap, and Bayesian methods for two independent binomial proportions are developed under two models. Simulation results show that all CIs except for the bootstrap percentile-t CI and Bayesian credible interval with uniform prior under the independence model and all CIs under the dependence model generally perform well and are recommended. Two examples are used to illustrate the methodologies.
Collapse
Affiliation(s)
- Shi-Fang Qiu
- Department of Statistics and Data Science, Chongqing University of Technology, Chongqing, China
| | - Li-Ming Wang
- Department of Statistics and Data Science, Chongqing University of Technology, Chongqing, China.,Chongqing Industry Polytechnic College, China
| | - Man-Lai Tang
- Department of Mathematics, Statistics and Insurance, Hang Seng University of Hong Kong, Hong Kong, China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
3
|
Zhong J, Wen MJ, Cheung SH, Poon WY. Simultaneous tests of non inferiority and superiority in three-arm clinical studies with heterogeneous variance. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2020.1747082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Junjiang Zhong
- School of Applied Mathematics, Xiamen University of Technology, Xiamen, China
| | - Miin-Jye Wen
- Department of Statistics, Institute of Data Science, and Institute of International Management, National Cheng Kung University, Tainan, Taiwan
| | - Siu Hung Cheung
- Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen, China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
4
|
Han Y, Lu ZH, Poon WY. Noninferiority testing for matched-pair ordinal data with misclassification. Stat Med 2019; 38:5332-5349. [PMID: 31637752 DOI: 10.1002/sim.8364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 07/21/2019] [Accepted: 08/18/2019] [Indexed: 11/11/2022]
Abstract
New treatments that are noninferior or equivalent to-but not necessarily superior to-the reference treatment may still be beneficial to patients because they have fewer side effects, are more convenient, take less time, or cost less. The noninferiority test is widely used in medical research to provide guidance in such situation. In addition, categorical variables are frequently encountered in medical research, such as in studies involving patient-reported outcomes. In this paper, we develop a noninferiority testing procedure for correlated ordinal categorical variables based on a paired design with a latent normal distribution approach. Misclassification is frequently encountered in the collection of ordinal categorical data; therefore, we further extend the procedure to account for misclassification using information in the partially validated data. Simulation studies are conducted to investigate the accuracy of the estimates, the type I error rates, and the power of the proposed procedure. Finally, we analyze one substantive example to demonstrate the utility of the proposed approach.
Collapse
Affiliation(s)
- Yuanyuan Han
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Zhao-Hua Lu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong
| |
Collapse
|
5
|
Qiu SF, He J, Tao JR, Tang ML, Poon WY. Comparison of disease prevalence in two populations under double-sampling scheme with two fallible classifiers. J Appl Stat 2019; 47:1375-1401. [DOI: 10.1080/02664763.2019.1679727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Shi-Fang Qiu
- Department of Statistics, Chongqing University of Technology, Chongqing, People's Republic of China
| | - Jie He
- Department of Statistics, Chongqing University of Technology, Chongqing, People's Republic of China
| | - Ji-Ran Tao
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Man-Lai Tang
- Department of Mathematics and Statistics, Hang Seng University of Hong Kong, Hong Kong, People's Republic of China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, People's Republic of China
| |
Collapse
|
6
|
Qiu SF, Poon WY, Tang ML, Tao JR. Construction of confidence intervals for the risk differences in stratified design with correlated bilateral data. J Biopharm Stat 2019; 29:446-467. [PMID: 30933654 DOI: 10.1080/10543406.2019.1579222] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
A stratified study is often designed for adjusting a confounding effect or effect of different centers/groups in two treatments or diagnostic tests, and the risk difference is one of the most frequently used indices in comparing efficiency between two treatments or diagnostic tests. This article presented five simultaneous confidence intervals (CIs) for risk differences in stratified bilateral designs accounting for the intraclass correlation and developed seven CIs for the common risk difference under the homogeneity assumption. The performance of the CIs is evaluated with respect to the empirical coverage probabilities, empirical coverage widths and ratios of mesial noncoverage probability and the noncoverage probability under various scenarios. Empirical results show that Wald simultaneous CI, Haldane simultaneous CI, Score simultaneous CI based on Bonferroni method and simultaneous CI based on bootstrap-resampling method perform satisfactorily and hence be recommended for applications, the CI based on the weighted-least-square (WLS) estimator, the CIs based on Mantel-Haenszel estimator, the CI based on Cochran statistic and the CI based on Score statistic for the common risk difference behave well even under small sample sizes. A real data example is used to demonstrate the proposed methodologies.
Collapse
Affiliation(s)
- Shi-Fang Qiu
- a Department of Statistics , Chongqing University of Technology , Chongqing , China
| | - Wai-Yin Poon
- b Department of Statistics , The Chinese University of Hong Kong , Hong Kong , China
| | - Man-Lai Tang
- c Department of Mathematics and Statistics , Hang Seng University of Hong Kong , Hong Kong , China
| | - Ji-Ran Tao
- d Qiushi College, Beijing Institute of Technology , Beijing , China
| |
Collapse
|
7
|
Abstract
Ordinal responses are common in clinical studies. Although the proportional odds model is a popular option for analyzing ordered-categorical data, it cannot control the type I error rate when the proportional odds assumption fails to hold. The latent Weibull model was recently shown to be a superior candidate for modeling ordinal data, with remarkably better performance than the latent normal model when the data are highly skewed. In clinical trials with ordinal responses, a balanced design is common, with equal sample allocation for each treatment. However, a more ethical approach is to adopt a response-adaptive allocation scheme in which more patients receive the better treatment. In this paper, we propose the use of the doubly adaptive biased coin design to generate treatment allocations that benefit the trial participants. The proposed treatment allocation scheme not only allows more patients to receive the better treatment, it also maintains compatible test power for the comparison of treatment efficiencies. A clinical example is used to illustrate the proposed procedure.
Collapse
Affiliation(s)
- Tong-Yu Lu
- College of Economics and Management, China Jiliang University, Hangzhou, China
| | - Ka Pui Chung
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong, China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong, China
| | - Siu Hung Cheung
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong, China.,Department of Statistics, National Cheng Kung University, Tainan
| |
Collapse
|
8
|
Qiu SF, Zeng XS, Tang ML, Poon WY. Test procedure and sample size determination for a proportion study using a double-sampling scheme with two fallible classifiers. Stat Methods Med Res 2017; 28:1019-1043. [PMID: 29233082 DOI: 10.1177/0962280217744239] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Double sampling is usually applied to collect necessary information for situations in which an infallible classifier is available for validating a subset of the sample that has already been classified by a fallible classifier. Inference procedures have previously been developed based on the partially validated data obtained by the double-sampling process. However, it could happen in practice that such infallible classifier or gold standard does not exist. In this article, we consider the case in which both classifiers are fallible and propose asymptotic and approximate unconditional test procedures based on six test statistics for a population proportion and five approximate sample size formulas based on the recommended test procedures under two models. Our results suggest that both asymptotic and approximate unconditional procedures based on the score statistic perform satisfactorily for small to large sample sizes and are highly recommended. When sample size is moderate or large, asymptotic procedures based on the Wald statistic with the variance being estimated under the null hypothesis, likelihood rate statistic, log- and logit-transformation statistics based on both models generally perform well and are hence recommended. The approximate unconditional procedures based on the log-transformation statistic under Model I, Wald statistic with the variance being estimated under the null hypothesis, log- and logit-transformation statistics under Model II are recommended when sample size is small. In general, sample size formulae based on the Wald statistic with the variance being estimated under the null hypothesis, likelihood rate statistic and score statistic are recommended in practical applications. The applicability of the proposed methods is illustrated by a real-data example.
Collapse
Affiliation(s)
- Shi-Fang Qiu
- 1 Department of Statistics, Chongqing University of Technology, Chongqing, China
| | - Xiao-Song Zeng
- 1 Department of Statistics, Chongqing University of Technology, Chongqing, China
| | - Man-Lai Tang
- 2 Department of Mathematics and Statistics, Hang Seng Management College, Hong Kong, China
| | - Wai-Yin Poon
- 3 Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
9
|
|
10
|
|
11
|
Abstract
Partially validated series are common when a gold-standard test is too expensive to be applied to all subjects, and hence a fallible device is used accordingly to measure the presence of a characteristic of interest. In this article, confidence interval construction for proportion difference between two independent partially validated series is studied. Ten confidence intervals based on the method of variance estimates recovery (MOVER) are proposed, with each using the confidence limits for the two independent binomial proportions obtained by the asymptotic, Logit-transformation, Agresti–Coull and Bayesian methods. The performances of the proposed confidence intervals and three likelihood-based intervals available in the literature are compared with respect to the empirical coverage probability, confidence width and ratio of mesial non-coverage to non-coverage probability. Our empirical results show that (1) all confidence intervals exhibit good performance in large samples; (2) confidence intervals based on MOVER combining the confidence limits for binomial proportions based on Wilson, Agresti–Coull, Logit-transformation, Bayesian (with three priors) methods perform satisfactorily from small to large samples, and hence can be recommended for practical applications. Two real data sets are analysed to illustrate the proposed methods.
Collapse
Affiliation(s)
- Shi-Fang Qiu
- Department of Statistics, Chongqing University of Technology, Chongqing, China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| | - Man-Lai Tang
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| |
Collapse
|
12
|
Abstract
Linear discriminant analysis has been widely applied in medical studies where atypical observations in a data set are usually encountered. While it is well known that the estimation in linear discriminant analysis can be conducted by using regression with dummy variates, typical regression diagnostic statistics cannot be applied to identify influential observations in discriminant analysis because these statistics are not invariant with regard to the codings of the dummy variates. We propose that regression model diagnostic measures developed from the local influence perspective can be used for identifying observations in a data set that exert undue influence on the result of the linear discriminant analysis. The measures are functions of the usual regression diagnostic statistics, such as leverage and residual, but are independent of the choice of the values of the dummy variate. They are local versions of Cook’s distance-type diagnostic statistic and the advantage of the measures lies in its ability in detecting a group rather than a single influential observation. The performance of the proposed measures are illustrated by analyses of three medical data sets and is compared with other diagnostic measures available in the literature. The results indicate that the proposed measures are simple and yet efficient discriminant diagnostic quantities. It is also observed from empirical evidence that a data point which is a multivariate outlier may not be influential in linear discriminant analysis.
Collapse
Affiliation(s)
- Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong,
| |
Collapse
|
13
|
Abstract
This article discusses the use of an approach advocated in the psychometrics and statistics literature for analyzing square contingency tables with ordered and comparable categories. Assuming that observed ordinal categorical variables are manifestations of underlying continuous variables, a model is formulated to compare the variables by studying the relative location and the relative dispersion of the underlying continuous variables. The formulation, interpretation, and analysis of the model are discussed, and the implementation of the proposed procedure using easily accessible software is addressed. The proposed approach is then compared with a widely adopted simple method that treats the ordinal measures as if they were interval scales. Analyses of real data and simulation results show that the simple method can be misleading and that the proposed approach is preferable in detecting variable differences.
Collapse
|
14
|
Yang P, Hung Cheung S, Poon WY. Multiple comparisons with two controls for ordered categorical responses. J Biopharm Stat 2016; 27:111-123. [PMID: 26881877 DOI: 10.1080/10543406.2016.1148707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
In clinical studies, ordered categorical responses are common. To compare the efficacy of several treatments with a control for ordinal responses, the normal latent variable model has recently been proposed. This approach conceptualizes the responses as manifestations of an underlying continuous normal variable. In this article, we extend this idea to develop the multiple comparison method for use when there are two controls in the clinical trial. The proposed method is constructed such that the familywise type I error rate is controlled at a prespecified level. In addition, for a given level of test power, the procedure to evaluate the required sample size is provided. The proposed testing procedure is also illustrated by an example from a clinical study.
Collapse
Affiliation(s)
- Ping Yang
- a Department of Statistics , The Chinese University of Hong Kong , Hong Kong , China
| | - Siu Hung Cheung
- a Department of Statistics , The Chinese University of Hong Kong , Hong Kong , China
| | - Wai-Yin Poon
- a Department of Statistics , The Chinese University of Hong Kong , Hong Kong , China
| |
Collapse
|
15
|
Abstract
In clinical studies, the proportional odds model is widely used to compare treatment efficacies when the responses are categorically ordered. However, this model has been shown to be inappropriate when the proportional odds assumption is invalid, mainly because it is unable to control the type I error rate in such circumstances. To remedy this problem, the latent normal model was recently promoted and has been demonstrated to be superior to the proportional odds model. However, the application of the latent normal model is limited to compare treatments with similar underlying distributions except possibly their means and variances. When the underlying distributions are very different in skewness, both of the aforementioned procedures suffer from the undesirable inflation of the type I error rate. To solve the problem for clinical studies with ordinal responses, we provide a viable solution that relies on the use of the latent Weibull distribution, which is a member of the log-location-scale family. The proposed model is able to control the type I error rate regardless of the degree of skewness of the treatment responses. In addition, the power of the test also outperforms that of the latent normal model. The testing procedure draws on newly developed theoretical results related to latent distributions from the location-scale family. The testing procedure is illustrated with two clinical examples.
Collapse
Affiliation(s)
- Tong-Yu Lu
- College of Economics and Management, China Jiliang University, Hangzhou, China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Siu Hung Cheung
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong
| |
Collapse
|
16
|
|
17
|
|
18
|
Abstract
Different latent variable models have been used to analyze ordinal categorical data which can be conceptualized as manifestations of an unobserved continuous variable. In this paper, we propose a unified framework based on a general latent variable model for the comparison of treatments with ordinal responses. The latent variable model is built upon the location-scale family and is rich enough to include many important existing models for analyzing ordinal categorical variables, including the proportional odds model, the ordered probit-type model, and the proportional hazards model. A flexible estimation procedure is proposed for the identification and estimation of the general latent variable model, which allows for the location and scale parameters to be freely estimated. The framework advances the existing methods by enabling many other popular models for analyzing continuous variables to be used to analyze ordinal categorical data, thus allowing for important statistical inferences such as location and/or dispersion comparisons among treatments to be conveniently drawn. Analysis on real data sets is used to illustrate the proposed methods.
Collapse
Affiliation(s)
- Tong-Yu Lu
- College of Economics and Management, China Jiliang University, Hangzhou, China,
| | | | | |
Collapse
|
19
|
Lin Y, Kwong KS, Cheung SH, Poon WY. Step-up testing procedure for multiple comparisons with a control for a latent variable model with ordered categorical responses. Stat Med 2014; 33:3629-38. [PMID: 24757077 DOI: 10.1002/sim.6190] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2013] [Revised: 01/08/2014] [Accepted: 04/07/2014] [Indexed: 11/11/2022]
Abstract
In clinical studies, multiple comparisons of several treatments to a control with ordered categorical responses are often encountered. A popular statistical approach to analyzing the data is to use the logistic regression model with the proportional odds assumption. As discussed in several recent research papers, if the proportional odds assumption fails to hold, the undesirable consequence of an inflated familywise type I error rate may affect the validity of the clinical findings. To remedy the problem, a more flexible approach that uses the latent normal model with single-step and stepwise testing procedures has been recently proposed. In this paper, we introduce a step-up procedure that uses the correlation structure of test statistics under the latent normal model. A simulation study demonstrates the superiority of the proposed procedure to all existing testing procedures. Based on the proposed step-up procedure, we derive an algorithm that enables the determination of the total sample size and the sample size allocation scheme with a pre-determined level of test power before the onset of a clinical trial. A clinical example is presented to illustrate our proposed method.
Collapse
Affiliation(s)
- Yueqiong Lin
- School of Economics and Management, Fuzhou University, Fuzhou, China
| | | | | | | |
Collapse
|
20
|
|
21
|
Abstract
Clinical trials frequently involve pairwise comparisons of different treatments to evaluate their relative efficacy. In this study, we examine methods for conducting pairwise tests of treatments with ordered categorical responses. A modified version of the Wilcoxon-Mann-Whitney test based on a logistic regression model assuming proportional odds is a popular choice for comparing two treatments. This paper discusses the extension of this test to pairwise comparisons involving more than two treatments. However, when the proportional odds assumption is not valid, the Wilcoxon-Mann-Whitney-type test procedure cannot control the overall type I error rate at the prespecified level of significance. We therefore propose a better strategy in which a latent normal model is employed. We presented a simulated comparative study of power and the overall type I error rate to illustrate the superiority of the latent normal model. Examples are also given for illustrative purposes.
Collapse
Affiliation(s)
- Yueqiong Lin
- School of Management, Fuzhou University, Fuzhou, China
| | | | | | | |
Collapse
|
22
|
Abstract
Investigating the prevalence of a disease is an important topic in medical studies. Such investigations are usually based on the classification results of a group of subjects according to whether they have the disease. To classify subjects, screening tests that are inexpensive and nonintrusive to the test subjects are frequently used to produce results in a timely manner. However, such screening tests may suffer from high levels of misclassification. Although it is often possible to design a gold-standard test or device that is not subject to misclassification, such devices are usually costly and time-consuming, and in some cases intrusive to the test subjects. As a compromise between these two approaches, it is possible to use data that are obtained by the method of double-sampling. In this article, we derive and investigate four test statistics for testing a hypothesis on disease prevalence with double-sampling data. The test statistics are implemented through both the asymptotic method suitable for large samples and approximate unconditional method suitable for small samples. Our simulation results show that the approximate unconditional method usually produces a more satisfactory empirical type I error rate and power than its asymptotic counterpart, especially for small to moderate sample sizes. The results also suggest that the score test and the Wald test based on an estimate of variance with parameters estimated under the null hypothesis outperform the others. An real example is used to illustrate the proposed methods.
Collapse
Affiliation(s)
- Man-Lai Tang
- Department of Mathematics , Hong Kong Baptist University, Hong Kong
| | | | | | | |
Collapse
|
23
|
|
24
|
Abstract
Summary Disease prevalence is an important topic in medical research, and its study is based on data that are obtained by classifying subjects according to whether a disease has been contracted. Classification can be conducted with high-cost gold standard tests or low-cost screening tests, but the latter are subject to the misclassification of subjects. As a compromise between the two, many research studies use partially validated datasets in which all data points are classified by fallible tests, and some of the data points are validated in the sense that they are also classified by the completely accurate gold-standard test. In this article, we investigate the determination of sample sizes for disease prevalence studies with partially validated data. We use two approaches. The first is to find sample sizes that can achieve a pre-specified power of a statistical test at a chosen significance level, and the second is to find sample sizes that can control the width of a confidence interval with a pre-specified confidence level. Empirical studies have been conducted to demonstrate the performance of various testing procedures with the proposed sample sizes. The applicability of the proposed methods are illustrated by a real-data example.
Collapse
Affiliation(s)
- Shi-Fang Qiu
- Department of Statistics, Chongqing University of Technology, China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, China
| | - Man-Lai Tang
- Department of Mathematics, Hong Kong Baptist University, China
| |
Collapse
|
25
|
Abstract
Ordered categorical data are frequently encountered in clinical studies. A popular method for comparing the efficacy of treatments is to use logistic regression with the proportional odds assumption. The test statistic is based on the Wilcoxon-Mann-Whitney test. However, the proportional odds assumption may not be appropriate. In such cases, the probability of rejecting the null hypothesis is much inflated even though the treatments have the same mean efficacy. An alternative approach that does not rely on the proportional odds assumption is to conceptualize the responses as manifestations of some underlying continuous variables. However, statistical procedures were developed only for the comparison of two treatments. In this article, we derive testing procedures that compare several treatments to a control, utilizing a latent normal distribution with the latent variable model. The proposed procedure is useful because multiple comparisons with a control is very frequently an objective of a clinical study. Data from clinical trials are used to illustrate the proposed procedures.
Collapse
Affiliation(s)
- Tong-Yu Lu
- College of Economics and Management, China Jiliang University, Hangzhou, China
| | - Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong.
| | - Siu Hung Cheung
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong
| |
Collapse
|
26
|
|
27
|
Lu TY, Poon WY, Tsang YF. Latent growth curve modeling for longitudinal ordinal responses with applications. Comput Stat Data Anal 2011. [DOI: 10.1016/j.csda.2010.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
28
|
Tang ML, Poon WY, Ling L, Liao Y, Chui HW. Approximate unconditional test procedure for comparing two ordered multinomials. Comput Stat Data Anal 2011. [DOI: 10.1016/j.csda.2010.08.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
29
|
Abstract
We develop a method for the analysis of multivariate ordinal categorical data with misclassification based on the latent normal variable approach. Misclassification arises if a subject has been classified into a category that does not truly reflect its actual state, and can occur with one or more variables. A basic framework is developed to enable the analysis of two types of data. The first corresponds to a single sample that is obtained from a fallible design that may lead to misclassified data. The other corresponds to data that is obtained by double sampling. Double sampling data consists of two parts: a sample that is obtained by classifying subjects using the fallible design only and a sample that is obtained by classifying subjects using both fallible and true designs, which is assumed to have no misclassification. A unified expectation-maximization approach is developed to find the maximum likelihood estimate of model parameters. Simulation studies and examples that are based on real data are used to demonstrate the applicability and practicability of the proposed methods.
Collapse
Affiliation(s)
- Wai-Yin Poon
- Department of Statistics, Chinese University of Hong Kong, Shatin, Hong Kong, People's Republic of China.
| | | |
Collapse
|
30
|
Abstract
A Thurstonian type approach is applied to modelling ranking data with ties. It uses a non-totally differentiable discriminational process instead of the conventional totally differential one to relate the observed rankings and the underlying subjective values. A Monte Carlo expectation-maximization algorithm is proposed to find the maximum likelihood estimates together with the standard errors of the parameters. The approach is examined numerically by means of an artificial example and a simulation study and is applied to a study of attribute assessment.
Collapse
Affiliation(s)
- Wai-Yin Poon
- Department of Statistics, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| | | |
Collapse
|
31
|
Abstract
Many variables that are used in social and behavioural science research are ordinal categorical or polytomous variables. When more than one polytomous variable is involved in an analysis, observations are classified in a contingency table, and a commonly used statistic for describing the association between two variables is the polychoric correlation. This paper investigates the estimation of the polychoric correlation when the data set consists of misclassified observations. Two approaches for estimating the polychoric correlation have been developed. One assumes that the probabilities in relation to misclassification are known, and the other uses a double sampling scheme to obtain information on misclassification. A parameter estimation procedure is developed, and statistical properties for the estimates are discussed. The practicability and applicability of the proposed approaches are illustrated by analysing data sets that are based on real and generated data. Excel programmes with visual basic for application (VBA) have been developed to compute the estimate of the polychoric correlation and its standard error. The use of the structural equation modelling programme Mx to find parameter estimates in the double sampling scheme is discussed.
Collapse
|
32
|
Abstract
Influence analysis is an important component of data analysis, and the local influence approach has been widely applied to many statistical models to identify influential observations and assess minor model perturbations since the pioneering work of Cook (1986). The approach is often adopted to develop influence analysis procedures for factor analysis models with ranking data. However, as this well-known approach is based on the observed data likelihood, which involves multidimensional integrals, directly applying it to develop influence analysis procedures for the factor analysis models with ranking data is difficult. To address this difficulty, a Monte Carlo expectation and maximization algorithm (MCEM) is used to obtain the maximum-likelihood estimate of the model parameters, and measures for influence analysis on the basis of the conditional expectation of the complete data log likelihood at the E-step of the MCEM algorithm are then obtained. Very little additional computation is needed to compute the influence measures, because it is possible to make use of the by-products of the estimation procedure. Influence measures that are based on several typical perturbation schemes are discussed in detail, and the proposed method is illustrated with two real examples and an artificial example.
Collapse
Affiliation(s)
- Liang Xu
- Department of Mathematics, South-east University, Nanjing, China
| | | | | |
Collapse
|
33
|
Poon WY, Sun Poon Y. Local Conditional Influence. J Appl Stat 2007. [DOI: 10.1080/02664760600744371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
34
|
|
35
|
Abstract
We consider the comparison of two treatments (or a treatment and a control/placebo) with responses that are classified into ordinal categories. By operating on the assumption that the responses are manifestations of some underlying continuous variables and that the definitions of the categories for the treatment group and the placebo group are the same in the same clinical test centre, we develop a model to examine the possible treatment effects. These treatment effects can be identified as location effect or dispersion effect. The method can be generalized to analyse clinical test results coming from different centres, where each centre may have its own standard in classifying responses. The method is technically undemanding and can be implemented in a very simple and straightforward way by using easily accessible software that can be downloaded at no cost. Real data sets are analysed for illustration.
Collapse
Affiliation(s)
- Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong.
| |
Collapse
|
36
|
Lee SY, Song XY, Poon WY. Comparison of Approaches in Estimating Interaction and Quadratic Effects of Latent Variables. Multivariate Behav Res 2004; 39:37-67. [PMID: 26759934 DOI: 10.1207/s15327906mbr3901_2] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Various approaches using the maximum likelihood (ML) option of the LISREL program and products of indicators have been proposed to analyze structural equation models with non-linear latent effects on the basis of Kenny and Judd's formulation. Recently, some methods based on the Bayesian approach and the exact ML approaches have been developed. This article reviews, elaborates and compares several approaches for analyzing nonlinear models with interaction and/or quadratic effects. A total of four approaches are examined, including the product indicator ML approaches proposed by Jaccard and Wan (1995) and Joreskog and Yang (1996), a Bayesian approach and an exact ML approach. The empirical performances of these approaches are assessed using simulation studies in terms of their capabilities in producing reliable parameter and standard error estimates. It is found that whilst the Bayesian and the exact ML approaches produce satisfactory results in all the settings under consideration, and are in general very reliable; the product indicator ML approaches can only produce reasonable results in simple models with large sample sizes.
Collapse
|
37
|
Poon WY, Ng SC. Identification of influential cells in the analysis of ordinal square tables. Br J Math Stat Psychol 2002; 55:231-46. [PMID: 12473226 DOI: 10.1348/000711002760554561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
For square tables arising from ordinal categorical variables which can be considered as manifestations of underlying continuous variables, it is possible to model the underlying continuous variables in a form that facilitates the comparison of their relative locations and dispersions. An efficient estimation method for such a model is available in the literature and the object of this paper is to develop an influence analysis procedure to accompany the estimation method. The local influence approach is used to obtain the diagnostic measures, and real data sets are analysed to illustrate the practicability of the proposed measures.
Collapse
Affiliation(s)
- Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong.
| | | |
Collapse
|
38
|
Abstract
We study a multiple group model with ordinal categorical observed variables that are manifestations of underlying normal variables. When the objective of an analysis is to compare the locations and dispersions of the underlying continuous variables in different groups, traditional approaches use exact linear constraints on thresholds across groups to identify the model. Though the resultant model facilitates interpretation in multiple group analysis, in some cases the exact linear relationships on thresholds are not appropriate for describing the reality. However, these constraints must be imposed to identify the model. In view of this, we propose to apply across group stochastic constraints on thresholds to identify the model. Stochastic constraints are more practical and flexible than exact constraints, and subsume exact constraints as a special case, and therefore enable the structure of the data to be described in a more realistic way. Using stochastic constraints, we can achieve an identified model that allows the comparison of underlying continuous variables in different groups relatively, and at the same time accommodate the possible differences in thresholds. A Bayesian approach is employed to analyze the model, and prior knowledge can be incorporated into the analysis. It is demonstrated that the parameter estimates can be produced conveniently using the Mx software program, and an illustrative sample Mx input script is presented. A real data set is analyzed with the proposed approach, and results are compared to those obtained by using other prevailing approaches.
Collapse
|
39
|
|
40
|
Abstract
Statistical procedures designed for analysing multivariate data sets often emphasize different sample statistics. While some procedures emphasize the estimates of both the mean vector mu and the covariance matrix Sigma, others may emphasize only one of these two sample quantities. In effect, while an unusual observation in a data set has a deleterious impact on the results from an analysis that depends heavily on the covariance matrix, its effect when dependence is on the mean vector may be minimal. The aim of this paper is to develop diagnostic measures for identifying influential observations of different kinds. Three diagnostic measures, based on the local influence approach, are constructed to identify observations that exercise undue influence on the estimate of mu of Sigma, and of both together. Real data sets are analysed and results are presented to illustrate the effectiveness of the proposed measures.
Collapse
Affiliation(s)
- Wai-Yin Poon
- Department of Statistics, The Chinese University of Hong Kong, Shatin.
| | | |
Collapse
|
41
|
|
42
|
Abstract
The local influence approach proposed by Cook (1986) makes use of the normal curvature and the direction achieving the maximum curvature to assess the local influence of minor perturbation of statistical models. When the approach is applied to the linear regression model, the result provides information concerning the data structure different from that contributed by Cook's distance. One of the main advantages of the local influence approach is its ability to handle the simultaneous effect of several cases, namely, the ability to address the problem of 'masking'. However, Lawrance (1995) points out that there are two notions of 'masking' effects, the joint influence and the conditional influence, which are distinct in nature. The normal curvature and the direction of maximum curvature are capable of addressing effects under the category of joint influences but not conditional influences. We construct a new measure to define and detect conditional local influences and use the linear regression model for illustration. Several reported data sets are used to demonstrate that new information can be revealed by this proposed measure.
Collapse
Affiliation(s)
- W Y Poon
- Department of Statistics, Chinese University of Hong Kong, Shatin, Hong Kong.
| | | |
Collapse
|
43
|
Abstract
We make use of Cook's local influence approach and its recent modification by Poon and Poon to develop measures for detecting multivariate outliers. The motivation and the foundation of the theory are geometrical and are different from classical approaches; however, whilst the proposed measure exhibits a form similar to those in the literature, it still has a considerable advantage in having transformed the classical measures to the unit interval. The new approach unifies outlier identification measures using geometrical concepts. It involves no distributional assumption or large-sample properties, and allows the flexibility of identifying outliers with respect to different metrics. The approach therefore provides a valid reason for using the various measures in complicated situations, such as in non-normal cases and in small-sample problems.
Collapse
Affiliation(s)
- W Y Poon
- Department of Statistics, Chinese University of Hong Kong, Shatin, Hong Kong.
| | | | | |
Collapse
|
44
|
Abstract
We analyse square contingency tables with ordered categories. Assuming that the observed ordinal categorical variables are manifestations of underlying continuous variables, we formulate a model which allows the comparisons of locations and dispersions between variables. We identify the model by imposing stochastic constraints on the thresholds that define the relationship between the observed and the underlying variables. As a result, the underlying continuous variables' location and dispersion parameters which were not estimable before can be estimated by the Bayesian approach. Illustrative examples are given based on several reported data sets.
Collapse
Affiliation(s)
- W Y Poon
- Department of Statistics, Chinese University of Hong Kong, Shatin, Hong Kong.
| |
Collapse
|
45
|
|
46
|
|
47
|
Abstract
This paper develops a computationally efficient procedure for analysis of structural equation models with continuous and polytomous variables. A partition maximum likelihood approach is used to obtain the first stage estimates of the thresholds and the polyserial and polychoric correlations in the underlying correlation matrix. Then, based on the joint asymptotic distribution of the first stage estimator and an appropriate weight matrix, a generalized least squares approach is employed to estimate the structural parameters in the correlation structure. Asymptotic properties of the estimators are derived. Some simulation studies are conducted to study the empirical behaviours and robustness of the procedure, and compare it with some existing methods.
Collapse
Affiliation(s)
- S Y Lee
- Department of Statistics, Chinese University of Hong Kong, Shatin, NT, Hong Kong
| | | | | |
Collapse
|
48
|
Lee SY, Poon WY, Bentler PM. Covariance and correlation structure analyses with continuous and polytomous variables. Institute of Mathematical Statistics Lecture Notes - Monograph Series 1994. [DOI: 10.1214/lnms/1215463807] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
49
|
|
50
|
|