1
|
Li M, Gao Q, Yang J, Yu T. Evaluating inter-rater reliability in the context of "Sysmex UN2000 detection of protein/creatinine ratio and of renal tubular epithelial cells can be used for screening lupus nephritis": a statistical examination. BMC Nephrol 2024; 25:94. [PMID: 38481181 PMCID: PMC10938658 DOI: 10.1186/s12882-024-03540-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND The evaluation of inter-rater reliability (IRR) is integral to research designs involving the assessment of observational ratings by two raters. However, existing literature is often heterogeneous in reporting statistical procedures and the evaluation of IRR, although such information can impact subsequent hypothesis testing analyses. METHODS This paper evaluates a recent publication by Chen et al., featured in BMC Nephrology, aiming to introduce an alternative statistical approach to assessing IRR and discuss its statistical properties. The study underscores the crucial need for selecting appropriate Kappa statistics, emphasizing the accurate computation, interpretation, and reporting of commonly used IRR statistics between two raters. RESULTS The Cohen's Kappa statistic is typically used for two raters dealing with two categories or for unordered categorical variables having three or more categories. On the other hand, when assessing the concordance between two raters for ordered categorical variables with three or more categories, the commonly employed measure is the weighted Kappa. CONCLUSION Chen and colleagues might have underestimated the agreement between AU5800 and UN2000. Although the statistical approach adopted in Chen et al.'s research did not alter their findings, it is important to underscore the importance of researchers being discerning in their choice of statistical techniques to address their specific research inquiries.
Collapse
Affiliation(s)
- Ming Li
- Department of Software Engineering, College of Computer Science and Technology, Harbin Engineering University, 150001, Harbin, China
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, 161006, Qiqihar, China
| | - Qian Gao
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, 161006, Qiqihar, China
| | - Jing Yang
- Department of Software Engineering, College of Computer Science and Technology, Harbin Engineering University, 150001, Harbin, China.
| | - Tianfei Yu
- Heilongjiang Provincial Key Laboratory of Resistance Gene Engineering and Protection of Bioaffiliationersity in Cold Areas, Qiqihar University, 161006, Qiqihar, China.
- Department of Biotechnology, College of Life Science and Agriculture Forestry, Qiqihar University, 161006, Qiqihar, China.
| |
Collapse
|
2
|
Li M, Gao Q, Yu T. Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters. BMC Cancer 2023; 23:799. [PMID: 37626309 PMCID: PMC10464133 DOI: 10.1186/s12885-023-11325-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 08/22/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND In research designs that rely on observational ratings provided by two raters, assessing inter-rater reliability (IRR) is a frequently required task. However, some studies fall short in properly utilizing statistical procedures, omitting essential information necessary for interpreting their findings, or inadequately addressing the impact of IRR on subsequent analyses' statistical power for hypothesis testing. METHODS This article delves into the recent publication by Liu et al. in BMC Cancer, analyzing the controversy surrounding the Kappa statistic and methodological issues concerning the assessment of IRR. The primary focus is on the appropriate selection of Kappa statistics, as well as the computation, interpretation, and reporting of two frequently used IRR statistics when there are two raters involved. RESULTS The Cohen's Kappa statistic is typically utilized to assess the level of agreement between two raters when there are two categories or for unordered categorical variables with three or more categories. On the other hand, when it comes to evaluating the degree of agreement between two raters for ordered categorical variables comprising three or more categories, the weighted Kappa is a widely used measure. CONCLUSION Despite not substantially affecting the findings of Liu et al.?s study, the statistical dispute underscores the significance of employing suitable statistical methods. Rigorous and accurate statistical results are crucial for producing trustworthy research.
Collapse
Affiliation(s)
- Ming Li
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006, China
| | - Qian Gao
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006, China
| | - Tianfei Yu
- Department of Biotechnology, College of Life Science and Agriculture Forestry, Qiqihar University, Qiqihar, 161006, China.
| |
Collapse
|
3
|
Li M, Gao Q, Yu T. Using appropriate Kappa statistic in evaluating inter-rater reliability. Short communication on "Groundwater vulnerability and contamination risk mapping of semi-arid Totko river basin, India using GIS-based DRASTIC model and AHP techniques". Chemosphere 2023; 328:138565. [PMID: 37011819 DOI: 10.1016/j.chemosphere.2023.138565] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/20/2023] [Accepted: 03/31/2023] [Indexed: 06/19/2023]
Abstract
In this article, some misuses of Kappa statistic in the original paper [Chemosphere, 307, 135831] are discussed. By using DRASTIC and Analytic Hierarchy Process (AHP) models, the authors have assessed the groundwater vulnerability of Totko, India. High nitrate concentrations in groundwater have been found in highly vulnerable areas, and the accuracy of the models has been assessed through Pearson's correlation coefficient and Kappa coefficient. However, using Cohen's Kappa to estimate the intra-rater reliabilities (IRRs) of the two models is not appropriate on the condition of ordinal categorical variables in five categories in the original paper. We briefly introduce the Kappa statistic and propose to use weighted Kappa to compute IRRs under such conditions. To conclude, we recognize that this does not significantly alter the conclusions of the original paper, but it is necessary to ensure that the appropriate statistical tools are used.
Collapse
Affiliation(s)
- Ming Li
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006, China
| | - Qian Gao
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006, China
| | - Tianfei Yu
- Department of Biotechnology, College of Life Science and Agriculture Forestry, Qiqihar University, Qiqihar, 161006, China; Heilongjiang Provincial Key Laboratory of Resistance Gene Engineering and Protection of Biodiversity in Cold Areas, Qiqihar University, Qiqihar, 161006, China.
| |
Collapse
|
4
|
Schade GW. Critique of Well Activity Proxy Uses Inadequate Data and Statistics. Int J Environ Res Public Health 2020; 17:E5597. [PMID: 32756437 DOI: 10.3390/ijerph17155597] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 07/08/2020] [Indexed: 01/27/2023]
Abstract
The recent publication, "Assessing Agreement in Exposure Classification between Proximity-Based Metrics and Air Monitoring Data in Epidemiology Studies of Unconventional Resource Development" by Hess et al [...].
Collapse
|
5
|
Ho GY, Leonhard M, Volk GF, Foerster G, Pototschnig C, Klinge K, Granitzka T, Zienau AK, Schneider-Stickler B. Inter-rater reliability of seven neurolaryngologists in laryngeal EMG signal interpretation. Eur Arch Otorhinolaryngol 2019; 276:2849-2856. [PMID: 31312924 PMCID: PMC6757022 DOI: 10.1007/s00405-019-05553-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Accepted: 07/08/2019] [Indexed: 12/14/2022]
Abstract
Purpose Laryngeal electromyography (LEMG) has been considered as gold standard in diagnostics of vocal fold movement impairment, but is still not commonly implemented in clinical routine. Since the signal interpretation of LEMG signals (LEMGs) is often a subjective and semi-quantitative matter, the goal of this study was to evaluate the inter-rater reliability of neurolaryngologists on LEMGs of volitional muscle activity. Methods For this study, 52 representative LEMGs of 371 LEMG datasets were selected from a multicenter registry for a blinded evaluation by 7 experienced members of the neurolaryngology working group of the European Laryngological Society (ELS). For the measurement of the observer agreement between two raters, Cohen’s Kappa statistic was calculated. For the interpretation of agreements of diagnoses among the seven examiners, we used the Fleiss’ Kappa statistic. Result When focusing on the categories “no activity”, “single fiber pattern”, and “strongly decreased recruitment pattern”, the inter-rater agreement varied from Cohen’s Kappa values between 0.48 and 0.84, indicating moderate to near-perfect agreement between the rater pairs. Calculating with Fleiss’ Kappa, a value of 0.61 showed good agreement among the seven raters. For the rating categories, the Fleiss’ Kappa value ranged from 0.52 to 0.74, which also showed a good agreement. Conclusion A good inter-rater agreement between the participating neurolaryngologists was achieved in the interpretation of LEMGs. More instructional courses should be offered to broadly implement LEMG as a reliable diagnostic tool in evaluating vocal fold movement disorders in clinical routine and to develop future algorithms for therapy and computer-assisted examination.
Collapse
Affiliation(s)
- Guan-Yuh Ho
- Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
| | - Matthias Leonhard
- Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria.
| | - Gerd Fabian Volk
- Department of Otorhinolaryngology, Jena University Hospital, Jena, Germany
| | - Gerhard Foerster
- Department of Otorhinolaryngology, SHR Wald-Klinikum Gera, Gera, Germany
| | - Claus Pototschnig
- Department of Otorhinolaryngology, University of Innsbruck, Innsbruck, Austria
| | - Kathleen Klinge
- Department of Otorhinolaryngology, SHR Wald-Klinikum Gera, Gera, Germany
| | - Thordis Granitzka
- Department of Otorhinolaryngology, Jena University Hospital, Jena, Germany
| | | | - Berit Schneider-Stickler
- Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
| |
Collapse
|
6
|
Tollafield DR. Clinical photographic observation of plantar corns and callus associated with a nominal scale classification and inter- observer reliability study in a student population. J Foot Ankle Res 2017; 10:45. [PMID: 29046725 PMCID: PMC5639769 DOI: 10.1186/s13047-017-0225-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Accepted: 09/19/2017] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The management of plantar corns and callus has a low cost-benefit with reduced prioritisation in healthcare. The distinction between types of keratin lesions that forms corns and callus has attracted limited interest. Observation is imperative to improving diagnostic predictions and a number of studies point to some confusion as to how best to achieve this. The use of photographic observation has been proposed to improve our understanding of intractable keratin lesions. METHODS Students from a podiatry school reviewed photographs where plantar keratin lesions were divided into four nominal groups; light callus (Grade 1), heavy defined callus (Grade 2), concentric keratin plugs (Grade 3) and callus with deeper density changes under the forefoot (Grade 4). A group of 'experts' assigned from qualified podiatrists validated the observer rated responses by the students. RESULTS Cohen's weighted statistic (k) was used to measure inter-observer reliability. First year students (unskilled) performed less well when viewing photographs (k = 0.33) compared to third year students (semi-skilled, k = 0.62). The experts performed better than students (k = 0.88) providing consistency with wound care models in other studies. CONCLUSIONS Improved clinical annotation of clinical features, supported by classification of keratin- based lesions, combined with patient outcome tools, could improve the scientific rationale to prioritise patient care. Problems associated with photographic assessment involves trying to differentiate similar lesions without the benefit of direct palpation. Direct observation of callus with and without debridement requires further investigation alongside the model proposed in this paper.
Collapse
Affiliation(s)
- David R Tollafield
- Spire Hospital Little Aston, Little Aston Hall Lane, B7 3UP, Sutton Coldfield, West Midlands UK
| |
Collapse
|
7
|
Hon CY, Danyluk Q, Bryce E, Janssen B, Neudorf M, Yassi A, Shen H, Astrakianakis G. Comparison of qualitative and quantitative fit-testing results for three commonly used respirators in the healthcare sector. J Occup Environ Hyg 2017; 14:175-179. [PMID: 27717300 DOI: 10.1080/15459624.2016.1237030] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
N95 filtering facepiece respirators are used by healthcare workers when there is a risk of exposure to airborne hazards during aerosol-generating procedures. Respirator fit-testing is required prior to use to ensure that the selected respirator provides an adequate face seal. Two common fit-test methods can be employed: qualitative fit-test (QLFT) or quantitative fit-test (QNFT). Respiratory protection standards deem both fit-tests to be acceptable. However, previous studies have indicated that fit-test results may differ between QLFT and QNFT and that the outcomes may also be influenced by the type of respirator model. The aim of this study was to determine if there is a difference in fit-test outcomes with our suite of respirators, 3M - 1860S, 1860, AND 1870, and whether the model impacts the fit-test results. Subjects were recruited from residential care facilities. Each participant was assigned a respirator and underwent sequential QLFT and QNFT fit-tests and the results (either pass or fail) were recorded. To ascertain the degree of agreement between the two fit-tests, a Kappa (Κ) statistic was conducted as per the American National Standards Institute (ANSI) respiratory protection standard. The pass-fail rates were stratified by respirator model and a Kappa statistic was calculated for each to determine effect of model on fit-test outcomes. We had 619 participants and the aggregate Κ statistic for all respirators was 0.63 which is below the suggested ANSI threshold of 0.70. There was no statistically significant difference in results when stratified by respirator model. QNFT and QLFT produced different fit-test outcomes for the three respirator models examined. The disagreement in outcomes between the two fit-test methods with our suite of N95 filtering facepiece respirators was approximately 12%. Our findings may benefit other healthcare organizations that use these three respirators.
Collapse
Affiliation(s)
- Chun-Yip Hon
- a Worksafe and Wellness, Vancouver Coastal Health , Vancouver , British Columbia , Canada
- b School of Occupational and Public Health, Ryerson University , Toronto , Ontario , Canada
| | - Quinn Danyluk
- c Workplace Health, Fraser Health , Surrey , British Columbia , Canada
| | - Elizabeth Bryce
- d Medical Microbiology and Infection Control, Vancouver Coastal Health , Vancouver , British Columbia , Canada
| | - Bob Janssen
- e Policy, Regulation & Research Division, WorkSafeBC , Richmond , British Columbia , Canada
| | - Mike Neudorf
- c Workplace Health, Fraser Health , Surrey , British Columbia , Canada
| | - Annalee Yassi
- f School of Population and Public Health, University of British Columbia , Vancouver , British Columbia , Canada
| | - Hui Shen
- f School of Population and Public Health, University of British Columbia , Vancouver , British Columbia , Canada
| | - George Astrakianakis
- f School of Population and Public Health, University of British Columbia , Vancouver , British Columbia , Canada
| |
Collapse
|
8
|
Zhang RF, Fu YC, Lu Y, Zhang XX, Hu YM, Zhou YJ, Tian NF, He JW, Yan ZH. What is the optimal cutoff value of the axis-line-angle technique for evaluating trunk imbalance in coronal plane? Spine J 2017; 17:230-235. [PMID: 27664342 DOI: 10.1016/j.spinee.2016.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 07/28/2016] [Accepted: 09/14/2016] [Indexed: 02/03/2023]
Abstract
BACKGROUND CONTEXT Accurately evaluating the extent of trunk imbalance in the coronal plane is significant for patients before and after treatment. We preliminarily practiced a new method, axis-line-angle technique (ALAT), for evaluating coronal trunk imbalance with excellent intra-observer and interobserver reliability. Radiologists and surgeons were encouraged to use this method in clinical practice. However, the optimal cutoff value of the ALAT for determination of the extent of coronal trunk imbalance has not been calculated up to now. PURPOSE The purpose of this study was to identify the cutoff value of the ALAT that best predicts a positive measurement point to assess coronal balance or imbalance. STUDY DESIGN/SETTING A retrospective study at a university affiliated hospital was carried out. PATIENT SAMPLE A total of 130 patients with C7-central sacral vertical line (CSVL) >0 mm and aged 10-18 years were recruited in this study from September 2013 to December 2014. OUTCOME MEASURES Data were analyzed to determine the optimal cutoff value of the ALAT measurement. METHODS The C7-CSVL and ALAT measurements were conducted respectively twice on plain film within a 2-week interval by two radiologists. The optimal cutoff value of the ALAT was analyzed via receiver operating characteristic (ROC) curve. Comparison variables were performed with chi-square test between the C7-CSVL and ALAT measurements for evaluating trunk imbalance. Kappa agreement coefficient method was used to test the intra-observer and interobserver agreement of C7-CSVL and ALAT. RESULTS The ROC curve area for the ALAT was 0.82 (95% confidence interval: 0.753-0.894, p<.001). The maximum Youden index was 0.51, and the corresponding cutoff point was 2.59°. No statistical difference was found between the C7-CSVL and ALAT measurements for evaluating trunk imbalance (p>.05). Intra-observer agreement values for the C7-CSVL measurements by observers 1 and 2 were 0.79 and 0.91 (p<.001), respectively, whereas intra-observer agreement values for the ALAT measurements were both 0.89 by observers 1 and 2 (p<.001). The interobserver agreement values for the first and second measurements with the C7-CSVL were 0.78 and 0.85 (p<.001), respectively, whereas the interobserver agreement values for the first and second measurements with the ALAT were 0.91 and 0.88 (p<.001), respectively. CONCLUSIONS The newly developed ALAT provided an acceptable optimal cutoff value for evaluating trunk imbalance in the coronal plane with a high level of intra-observer and interobserver agreement, which suggests that the ALAT is suitable for clinical use.
Collapse
Affiliation(s)
- Rui-Fang Zhang
- Department of Radiology, Children's Hospital, Zhejiang University School of Medicine, 3333 Binsheng Rd, 310052 Hangzhou, China
| | - Yu-Chuan Fu
- Department of Radiology, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, 109 Xueyuanxi Rd, 325027 Wenzhou, China
| | - Yi Lu
- Department of Radiology, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, 109 Xueyuanxi Rd, 325027 Wenzhou, China
| | - Xiao-Xia Zhang
- Department of Radiology, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, 109 Xueyuanxi Rd, 325027 Wenzhou, China
| | - Yu-Min Hu
- Department of Radiology, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, 109 Xueyuanxi Rd, 325027 Wenzhou, China
| | - Yong-Jin Zhou
- Department of Radiology, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, 109 Xueyuanxi Rd, 325027 Wenzhou, China
| | - Nai-Feng Tian
- Department of Spine Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, 109 Xueyuanxi Rd, 325027 Wenzhou, China
| | - Jia-Wei He
- Department of Radiology, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, 109 Xueyuanxi Rd, 325027 Wenzhou, China
| | - Zhi-Han Yan
- Department of Radiology, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, 109 Xueyuanxi Rd, 325027 Wenzhou, China.
| |
Collapse
|
9
|
Abstract
Crucial therapeutic decisions are based on diagnostic tests. Therefore, it is important to evaluate such tests before adopting them for routine use. Although things such as blood tests, cultures, biopsies, and radiological imaging are obvious diagnostic tests, it is not to be forgotten that specific clinical examination procedures, scoring systems based on physiological or psychological evaluation, and ratings based on questionnaires are also diagnostic tests and therefore merit similar evaluation. In the simplest scenario, a diagnostic test will give either a positive (disease likely) or negative (disease unlikely) result. Ideally, all those with the disease should be classified by a test as positive and all those without the disease as negative. Unfortunately, practically no test gives 100% accurate results. Therefore, leaving aside the economic question, the performance of diagnostic tests is evaluated on the basis of certain indices such as sensitivity, specificity, positive predictive value, and negative predictive value. Likelihood ratios combine information on specificity and sensitivity to expresses the likelihood that a given test result would occur in a subject with a disorder compared to the probability that the same result would occur in a subject without the disorder. Not all test can be categorized simply as "positive" or "negative." Physicians are frequently exposed to test results on a numerical scale, and in such cases, judgment is required in choosing a cutoff point to distinguish normal from abnormal. Naturally, a cutoff value should provide the greatest predictive accuracy, but there is a trade-off between sensitivity and specificity here - if the cutoff is too low, it will identify most patients who have the disease (high sensitivity) but will also incorrectly identify many who do not (low specificity). A receiver operating characteristic curve plots pairs of sensitivity versus (1 - specificity) values and helps in selecting an optimum cutoff - the one lying on the "elbow" of the curve. Cohen's kappa (κ) statistic is a measure of inter-rater agreement for categorical variables. It can also be applied to assess how far two tests agree with respect to diagnostic categorization. It is generally thought to be a more robust measure than simple percent agreement calculation since kappa takes into account the agreement occurring by chance.
Collapse
Affiliation(s)
- Avijit Hazra
- From the Department of Pharmacology, Institute of Postgraduate Medical Education and Research, Kolkata, West Bengal, India
| | - Nithya Gogtay
- Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Mumbai, Maharashtra, India
| |
Collapse
|
10
|
Kang C, Qaqish B, Monaco J, Sheridan SL, Cai J. Kappa statistic for clustered dichotomous responses from physicians and patients. Stat Med 2013; 32:3700-19. [PMID: 23533082 DOI: 10.1002/sim.5796] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Revised: 01/15/2013] [Accepted: 02/26/2013] [Indexed: 11/09/2022]
Abstract
The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared with the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. We present an example of an application to a coronary heart disease prevention study.
Collapse
Affiliation(s)
- Chaeryon Kang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | | | | | | | | |
Collapse
|