Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Zwick R. Another look at interrater agreement. Psychol Bull 1988;103:374-8. [PMID: 3380931 DOI: 10.1037/0033-2909.103.3.374] [Citation(s) in RCA: 164] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Number

Cited by Other Article(s)

Zhao X, Feng GC, Ao SH, Liu PL. Interrater reliability estimators tested against true interrater reliabilities. BMC Med Res Methodol 2022;22:232. [PMID: 36038846 PMCID: PMC9426226 DOI: 10.1186/s12874-022-01707-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 08/04/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Interrater reliability, aka intercoder reliability, is defined as true agreement between raters, aka coders, without chance agreement. It is used across many disciplines including medical and health research to measure the quality of ratings, coding, diagnoses, or other observations and judgements. While numerous indices of interrater reliability are available, experts disagree on which ones are legitimate or more appropriate. Almost all agree that percent agreement (a_o), the oldest and the simplest index, is also the most flawed because it fails to estimate and remove chance agreement, which is produced by raters' random rating. The experts, however, disagree on which chance estimators are legitimate or better. The experts also disagree on which of the three factors, rating category, distribution skew, or task difficulty, an index should rely on to estimate chance agreement, or which factors the known indices in fact rely on. The most popular chance-adjusted indices, according to a functionalist view of mathematical statistics, assume that all raters conduct intentional and maximum random rating while typical raters conduct involuntary and reluctant random rating. The mismatches between the assumed and the actual rater behaviors cause the indices to rely on mistaken factors to estimate chance agreement, leading to the numerous paradoxes, abnormalities, and other misbehaviors of the indices identified by prior studies.

METHODS

We conducted a 4 × 8 × 3 between-subject controlled experiment with 4 subjects per cell. Each subject was a rating session with 100 pairs of rating by two raters, totaling 384 rating sessions as the experimental subjects. The experiment tested seven best-known indices of interrater reliability against the observed reliabilities and chance agreements. Impacts of the three factors, i.e., rating category, distribution skew, and task difficulty, on the indices were tested.

RESULTS

The most criticized index, percent agreement (a_o), showed as the most accurate predictor of reliability, reporting directional r² = .84. It was also the third best approximator, overestimating observed reliability by 13 percentage points on average. The three most acclaimed and most popular indices, Scott's π, Cohen's κ and Krippendorff's α, underperformed all other indices, reporting directional r² = .312 and underestimated reliability by 31.4 ~ 31.8 points. The newest index, Gwet's AC₁, emerged as the second-best predictor and the most accurate approximator. Bennett et al's S ranked behind AC₁, and Perreault and Leigh's I_r ranked the fourth both for prediction and approximation. The reliance on category and skew and failure to rely on difficulty explain why the six chance-adjusted indices often underperformed a_o, which they were created to outperform. The evidence corroborated the notion that the chance-adjusted indices assume intentional and maximum random rating while the raters instead exhibited involuntary and reluctant random rating.

CONCLUSION

The authors call for more empirical studies and especially more controlled experiments to falsify or qualify this study. If the main findings are replicated and the underlying theories supported, new thinking and new indices may be needed. Index designers may need to refrain from assuming intentional and maximum random rating, and instead assume involuntary and reluctant random rating. Accordingly, the new indices may need to rely on task difficulty, rather than distribution skew or rating category, to estimate chance agreement.

Collapse

Scaringella L, Górska A, Calderon D, Benitez J. Should we teach in hybrid mode or fully online? A theory and empirical investigation on the service–profit chain in MBAs. INFORMATION & MANAGEMENT 2022. [DOI: 10.1016/j.im.2021.103573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Warrens MJ. Similarity measures for 2 × 2 tables. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-172291] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Beckler DT, Thumser ZC, Schofield JS, Marasco PD. Reliability in evaluator-based tests: using simulation-constructed models to determine contextually relevant agreement thresholds. BMC Med Res Methodol 2018;18:141. [PMID: 30453897 PMCID: PMC6245899 DOI: 10.1186/s12874-018-0606-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 11/02/2018] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

Indices of inter-evaluator reliability are used in many fields such as computational linguistics, psychology, and medical science; however, the interpretation of resulting values and determination of appropriate thresholds lack context and are often guided only by arbitrary "rules of thumb" or simply not addressed at all. Our goal for this work was to develop a method for determining the relationship between inter-evaluator agreement and error to facilitate meaningful interpretation of values, thresholds, and reliability.

METHODS

Three expert human evaluators completed a video analysis task, and averaged their results together to create a reference dataset of 300 time measurements. We simulated unique combinations of systematic error and random error onto the reference dataset to generate 4900 new hypothetical evaluators (each with 300 time measurements). The systematic errors and random errors made by the hypothetical evaluator population were approximated as the mean and variance of a normally-distributed error signal. Calculating the error (using percent error) and inter-evaluator agreement (using Krippendorff's alpha) between each hypothetical evaluator and the reference dataset allowed us to establish a mathematical model and value envelope of the worst possible percent error for any given amount of agreement.

RESULTS

We used the relationship between inter-evaluator agreement and error to make an informed judgment of an acceptable threshold for Krippendorff's alpha within the context of our specific test. To demonstrate the utility of our modeling approach, we calculated the percent error and Krippendorff's alpha between the reference dataset and a new cohort of trained human evaluators and used our contextually-derived Krippendorff's alpha threshold as a gauge of evaluator quality. Although all evaluators had relatively high agreement (> 0.9) compared to the rule of thumb (0.8), our agreement threshold permitted evaluators with low error, while rejecting one evaluator with relatively high error.

CONCLUSIONS

We found that our approach established threshold values of reliability, within the context of our evaluation criteria, that were far less permissive than the typically accepted "rule of thumb" cutoff for Krippendorff's alpha. This procedure provides a less arbitrary method for determining a reliability threshold and can be tailored to work within the context of any reliability index.

Collapse

Sgammato A, Donoghue JR. On the Performance of the Marginal Homogeneity Test to Detect Rater Drift. APPLIED PSYCHOLOGICAL MEASUREMENT 2018;42:307-320. [PMID: 29881127 PMCID: PMC5978607 DOI: 10.1177/0146621617730390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Projections of Future Land Use in Bangladesh under the Background of Baseline, Ecological Protection and Economic Development. SUSTAINABILITY 2017. [DOI: 10.3390/su9040505] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Ganjali M, Moradzadeh N, Baghfalaki T. Bayesian testing of agreement criteria under order constraints. J Korean Stat Soc 2017. [DOI: 10.1016/j.jkss.2016.06.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Mielke PW, Berry KJ, Johnston JE. The Exact Variance of Weighted Kappa with Multiple Raters. Psychol Rep 2016;101:655-60. [DOI: 10.2466/pr0.101.2.655-660] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Sicoly F. Estimating the Accuracy of Decisions Based on Cutting Scores. JOURNAL OF PSYCHOEDUCATIONAL ASSESSMENT 2016. [DOI: 10.1177/073428299201000102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

McGrath RE, Pogge DL, Stokes JM, Cragnolino A, Zaccario M, Hayman J, Piacentini T, Wayland-Smith D. Field Reliability of Comprehensive System Scoring in an Adolescent Inpatient Sample. Assessment 2016;12:199-209. [PMID: 15914721 DOI: 10.1177/1073191104273384] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Friedrich J, Fetherstonhaugh D, Casey S, Gallagher D. Argument Integration and Attitude Change: Suppression Effects in the Integration of One-Sided Arguments that Vary in Persuasiveness. PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN 2016. [DOI: 10.1177/0146167296222007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Amendola LM, Jarvik GP, Leo MC, McLaughlin HM, Akkari Y, Amaral MD, Berg JS, Biswas S, Bowling KM, Conlin LK, Cooper GM, Dorschner MO, Dulik MC, Ghazani AA, Ghosh R, Green RC, Hart R, Horton C, Johnston JJ, Lebo MS, Milosavljevic A, Ou J, Pak CM, Patel RY, Punj S, Richards CS, Salama J, Strande NT, Yang Y, Plon SE, Biesecker LG, Rehm HL. Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium. Am J Hum Genet 2016;98:1067-1076. [PMID: 27181684 DOI: 10.1016/j.ajhg.2016.03.024] [Citation(s) in RCA: 311] [Impact Index Per Article: 38.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 03/22/2016] [Indexed: 02/06/2023] Open

Abstract

Evaluating the pathogenicity of a variant is challenging given the plethora of types of genetic evidence that laboratories consider. Deciding how to weigh each type of evidence is difficult, and standards have been needed. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published guidelines for the assessment of variants in genes associated with Mendelian diseases. Nine molecular diagnostic laboratories involved in the Clinical Sequencing Exploratory Research (CSER) consortium piloted these guidelines on 99 variants spanning all categories (pathogenic, likely pathogenic, uncertain significance, likely benign, and benign). Nine variants were distributed to all laboratories, and the remaining 90 were evaluated by three laboratories. The laboratories classified each variant by using both the laboratory's own method and the ACMG-AMP criteria. The agreement between the two methods used within laboratories was high (K-alpha = 0.91) with 79% concordance. However, there was only 34% concordance for either classification system across laboratories. After consensus discussions and detailed review of the ACMG-AMP criteria, concordance increased to 71%. Causes of initial discordance in ACMG-AMP classifications were identified, and recommendations on clarification and increased specification of the ACMG-AMP criteria were made. In summary, although an initial pilot of the ACMG-AMP guidelines did not lead to increased concordance in variant interpretation, comparing variant interpretations to identify differences and having a common framework to facilitate resolution of those differences were beneficial for improving agreement, allowing iterative movement toward increased reporting consistency for variants in genes associated with monogenic disease.

Collapse

Affiliation(s)

Laura M Amendola Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
Gail P Jarvik Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA.
Michael C Leo Center for Health Research, Kaiser Permanente, Portland, OR 97227, USA
Heather M McLaughlin Laboratory for Molecular Medicine, Partners HealthCare Personalized Medicine, Cambridge, MA 02139, USA
Yassmine Akkari Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR 97239, USA
Michelle D Amaral HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
Jonathan S Berg Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
Sawona Biswas Division of Human Genetics, Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
Kevin M Bowling HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
Laura K Conlin Division of Human Genetics, Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
Greg M Cooper HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
Michael O Dorschner Center for Precision Diagnostics, Department of Pathology, University of Washington, Seattle, WA 98195, USA
Matthew C Dulik Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
Arezou A Ghazani Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
Rajarshi Ghosh Baylor College of Medicine, Houston, TX 77030, USA
Robert C Green Laboratory for Molecular Medicine, Partners HealthCare Personalized Medicine, Cambridge, MA 02139, USA; Brigham and Women's Hospital and Harvard Medical School, Cambridge, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
Ragan Hart Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
Carrie Horton Clinical Diagnostics, Ambry Genetics, Aliso Viejo, CA 92656, USA
Jennifer J Johnston Intramural Research Program, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
Matthew S Lebo Laboratory for Molecular Medicine, Partners HealthCare Personalized Medicine, Cambridge, MA 02139, USA; Brigham and Women's Hospital and Harvard Medical School, Cambridge, MA 02115, USA
Aleksandar Milosavljevic Baylor College of Medicine, Houston, TX 77030, USA
Jeffrey Ou Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
Christine M Pak Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR 97239, USA
Ronak Y Patel Baylor College of Medicine, Houston, TX 77030, USA
Sumit Punj Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR 97239, USA
Carolyn Sue Richards Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR 97239, USA
Joseph Salama Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
Natasha T Strande Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
Yaping Yang Baylor College of Medicine, Houston, TX 77030, USA
Sharon E Plon Baylor College of Medicine, Houston, TX 77030, USA
Leslie G Biesecker Intramural Research Program, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
Heidi L Rehm Laboratory for Molecular Medicine, Partners HealthCare Personalized Medicine, Cambridge, MA 02139, USA; Brigham and Women's Hospital and Harvard Medical School, Cambridge, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.

Collapse

Kirilenko AP, Stepchenkova S. Inter-Coder Agreement in One-to-Many Classification: Fuzzy Kappa. PLoS One 2016;11:e0149787. [PMID: 26933956 PMCID: PMC4775035 DOI: 10.1371/journal.pone.0149787] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 02/04/2016] [Indexed: 11/19/2022] Open

Mooney SJ, DiMaggio CJ, Lovasi GS, Neckerman KM, Bader MDM, Teitler JO, Sheehan DM, Jack DW, Rundle AG. Use of Google Street View to Assess Environmental Contributions to Pedestrian Injury. Am J Public Health 2016;106:462-9. [PMID: 26794155 DOI: 10.2105/ajph.2015.302978] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Affiliation(s)

Stephen J Mooney Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health
Charles J DiMaggio Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health
Gina S Lovasi Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health
Kathryn M Neckerman Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health
Michael D M Bader Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health
Julien O Teitler Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health
Daniel M Sheehan Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health
Darby W Jack Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health
Andrew G Rundle Stephen J. Mooney, Charles J. DiMaggio, Gina S. Lovasi, Daniel M. Sheehan, and Andrew G. Rundle are with Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY. Kathryn M. Neckerman is with Columbia Population Research Center, Columbia University. Michael D. M. Bader is with Department of Sociology, American University, Washington, DC. Julien O. Teitler is with School of Social Work, Columbia University. Darby W. Jack is with Department of Environmental Health Sciences, Mailman School of Public Health

Collapse

Moradzadeh N, Ganjali M, Baghfalaki T. Weighted kappa as a function of unweighted kappas. COMMUN STAT-SIMUL C 2015. [DOI: 10.1080/03610918.2015.1105975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Nussbeck FW, Eid M. Multimethod latent class analysis. Front Psychol 2015;6:1332. [PMID: 26441714 PMCID: PMC4584970 DOI: 10.3389/fpsyg.2015.01332] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 08/19/2015] [Indexed: 11/13/2022] Open

Harmon-Walker G, Kaiser DH. The Bird's Nest Drawing: A study of construct validity and interrater reliability. ARTS IN PSYCHOTHERAPY 2015. [DOI: 10.1016/j.aip.2014.12.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

A sequential examination of parent-child interactions at anesthetic induction. J Clin Psychol Med Settings 2014;21:374-85. [PMID: 25352168 DOI: 10.1007/s10880-014-9413-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Chang CH, Yang JT, Lee MH. A Novel “Maximizing Kappa” Approach for Assessing the Ability of a Diagnostic Marker and Its Optimal Cutoff Value. J Biopharm Stat 2014;25:1005-19. [DOI: 10.1080/10543406.2014.920347] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Chang CH. Cohen's kappa for capturing discrimination. Int Health 2014;6:125-9. [DOI: 10.1093/inthealth/ihu010] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Karelitz TM, Budescu DV. The Effect of the Raters' Marginal Distributions on Their Matched Agreement: A Rescaling Framework for Interpreting Kappa. MULTIVARIATE BEHAVIORAL RESEARCH 2013;48:923-952. [PMID: 26745599 DOI: 10.1080/00273171.2013.830064] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Parpia S, Koval JJ, Donner A. Evaluation of confidence intervals for the kappa statistic when the assumption of marginal homogeneity is violated. Comput Stat 2013. [DOI: 10.1007/s00180-013-0424-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Feng GC. Factors affecting intercoder reliability: a Monte Carlo experiment. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/s11135-012-9745-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Rigor in qualitative supply chain management research. INTERNATIONAL JOURNAL OF PHYSICAL DISTRIBUTION & LOGISTICS MANAGEMENT 2012. [DOI: 10.1108/09600031211269767] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Rotondi MA, Donner A. A confidence interval approach to sample size estimation for interobserver agreement studies with multiple raters and outcomes. J Clin Epidemiol 2012;65:778-84. [DOI: 10.1016/j.jclinepi.2011.10.019] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Revised: 10/25/2011] [Accepted: 10/30/2011] [Indexed: 10/28/2022]

Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. ACTA ACUST UNITED AC 2012. [DOI: 10.1016/j.stamet.2011.08.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Warrens MJ. A family of multi-rater kappas that can always be increased and decreased by combining categories. ACTA ACUST UNITED AC 2012. [DOI: 10.1016/j.stamet.2011.08.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Warrens MJ. Equivalences of weighted kappas for multiple raters. ACTA ACUST UNITED AC 2012. [DOI: 10.1016/j.stamet.2011.11.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Fakhri A, Pakpour AH, Burri A, Morshedi H, Zeidi IM. The Female Sexual Function Index: Translation and Validation of an Iranian Version. J Sex Med 2012;9:514-23. [DOI: 10.1111/j.1743-6109.2011.02553.x] [Citation(s) in RCA: 114] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Meta-analysis of Cohen’s kappa. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2011. [DOI: 10.1007/s10742-011-0077-3] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]

Warrens MJ. Cohen’s kappa is a weighted average. ACTA ACUST UNITED AC 2011. [DOI: 10.1016/j.stamet.2011.06.002] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Warrens MJ. Cohen’s linearly weighted kappa is a weighted average. ADV DATA ANAL CLASSI 2011. [DOI: 10.1007/s11634-011-0094-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Jaudi S, Du Montcel ST, Fries N, Nizard J, Desfontaines VH, Dommergues M. Online evaluation of fetal second-trimester four-chamber view images: a comparison of six evaluation methods. ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2011;38:185-190. [PMID: 21308829 DOI: 10.1002/uog.8941] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Kottner J, Streiner DL. The difference between reliability and agreement. J Clin Epidemiol 2011;64:701-2; author reply 702. [PMID: 21411278 DOI: 10.1016/j.jclinepi.2010.12.001] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Accepted: 12/07/2010] [Indexed: 11/15/2022]

Warrens MJ. Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. ACTA ACUST UNITED AC 2011. [DOI: 10.1016/j.stamet.2010.09.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Lee J, Imanaka Y, Sekimoto M, Nishikawa H, Ikai H, Motohashi T. Validation of a novel method to identify healthcare-associated infections. J Hosp Infect 2011;77:316-20. [PMID: 21277647 DOI: 10.1016/j.jhin.2010.11.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Accepted: 11/07/2010] [Indexed: 10/18/2022]

Content Analysis—A Methodological Primer for Gender Research. SEX ROLES 2010. [DOI: 10.1007/s11199-010-9893-0] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Viehweger E, Pfund LZ, Hélix M, Rohon MA, Jacquemier M, Scavarda D, Jouve JL, Bollini G, Loundou A, Simeoni MC. Influence of clinical and gait analysis experience on reliability of observational gait analysis (Edinburgh Gait Score Reliability). Ann Phys Rehabil Med 2010;53:535-46. [DOI: 10.1016/j.rehab.2010.09.002] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Revised: 09/04/2010] [Accepted: 09/14/2010] [Indexed: 10/19/2022]

Warrens MJ. Cohen’s kappa can always be increased and decreased by combining categories. ACTA ACUST UNITED AC 2010. [DOI: 10.1016/j.stamet.2010.05.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Cernicchiaro N, Pearl DL, McEwen SA, LeJeune JT. Assessment of diagnostic tools for identifying cattle shedding and super-shedding Escherichia coli O157:H7 in a longitudinal study of naturally infected feedlot steers in Ohio. Foodborne Pathog Dis 2010;8:239-48. [PMID: 21034264 DOI: 10.1089/fpd.2010.0666] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

The objectives of this study were to compare the performance of different diagnostic protocols (rectoanal mucosal swabs and immunomagnetic separation [RAMS-IMS], fecal samples and IMS [fecal-IMS], and direct plating) to determine the prevalence of Escherichia coli O157:H7 and to evaluate the pattern of E. coli O157:H7 shedding and super-shedding (defined as having a direct plating count equal to or >10(4) colony forming units of E. coli O157:H7 per gram of feces) in a longitudinal study of naturally infected feedlot steers. RAMS and fecal grab samples were obtained at 14-day intervals from 168 Angus-cross beef steers over a period of 22 weeks. Fecal samples were assessed by direct plating and IMS, whereas RAMS were tested only by enrichment followed by IMS to recover E. coli O157:H7. The period prevalence for shedding was high (62%) among feedlot steers and super-shedding was higher (23%) than anticipated. Although direct plating was the least sensitive method to detect E. coli O157:H7-positive samples, over 20% of high bacterial load samples were not detected by RAMS-IMS and/or fecal-IMS. The sensitivity of RAMS-IMS, fecal-IMS, and direct plating protocols was estimated using simple and multilevel mixed-effects logistic regression models, in which the dependent variable was the dichotomous results of each test and gold standard (i.e., parallel interpretation of the three protocols)-positive individuals were included as an independent variable along with other factors such as dietary supplements, time of sampling, and being exposed to a super-shedding pen-mate. The associations between these factors and the sensitivity of the diagnostic protocols were not statistically significant. In conclusion, differences in the reported impact of diet and probiotics on the shedding of E. coli O157:H7 in previous studies using RAMS-IMS or fecal-IMS were unlikely due to their impact on test performance.

Collapse

Inequalities between multi-rater kappas. ADV DATA ANAL CLASSI 2010. [DOI: 10.1007/s11634-010-0073-4] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Peirce'siand Cohen'sκfor2×2Measures of Rater Reliability. JOURNAL OF PROBABILITY AND STATISTICS 2010. [DOI: 10.1155/2010/480364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Mielke PW, Berry KJ, Johnston JE. Resampling Probability Values for Weighted Kappa with Multiple Raters. Psychol Rep 2008;102:606-13. [DOI: 10.2466/pr0.102.2.606-613] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Grayson DA. Latent trait models for validity and reliability with 2×2 tables. AUSTRALIAN JOURNAL OF PSYCHOLOGY 2007. [DOI: 10.1080/00049539808258797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

MIELKE PAULW. THE EXACT VARIANCE OF WEIGHTED KAPPA WITH MULTIPLE RATERS. Psychol Rep 2007. [DOI: 10.2466/pr0.101.6.655-660] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Kujan O, Oliver RJ, Khattab A, Roberts SA, Thakker N, Sloan P. Evaluation of a new binary system of grading oral epithelial dysplasia for prediction of malignant transformation. Oral Oncol 2006;42:987-93. [PMID: 16731030 DOI: 10.1016/j.oraloncology.2005.12.014] [Citation(s) in RCA: 260] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2005] [Accepted: 12/09/2005] [Indexed: 11/21/2022]

Abstract

The aim of this paper is to assess the reproducibility of a novel binary grading system (high/low risk) of oral epithelial dysplasia and to compare it with the WHO classification 2005. The accuracy of the new system for predicting malignant transformation was also assessed. Ninety-six consecutive oral epithelial dysplasia biopsies with known clinical outcomes were retrieved from the Oral Pathology archives. A pilot study was conducted on 28 cases to determine the process of classification. Four observers then reviewed the same set of H&E stained slides of 68 oral dysplastic lesions using the two grading systems blinded to the clinical outcomes. The overall inter-observer unweighted and weighted kappa agreements for the WHO grading system were Ks = 0.22 (95% CI: 0.11-0.35), Kw = 0.63 (95% CI: 0.42-0.78), respectively, versus K = 0.50 (95% CI: 0.35-0.67) for the new binary system. Interestingly, all pathologists showed satisfactory agreement on the distinction of mild dysplasia from severe dysplasia and from carcinoma in situ using the new WHO classification. However, assessment of moderate dysplasia remains problematic. The sensitivity and specificity of the new binary grading system for predicting malignant transformation in oral epithelial dysplasia were 85% and 80%, respectively and the accuracy was 82%. The new binary grading system complemented the WHO Classification 2005 and may have merit in helping clinicians to make critical clinical decisions particularly for the cases of moderate dysplasia. Histological grading of dysplasia using established criteria is a reproducible prognosticator in oral epithelial dysplasia. Furthermore, the present study showed that more consensus scoring on either the degree of dysplasia, assessment of risk or the presence of each morphological characteristic by a panel should be encouraged.

Collapse

Malek IA, Machani B, Mevcha AM, Hyder NH. Inter-observer reliability and intra-observer reproducibility of the Weber classification of ankle fractures. ACTA ACUST UNITED AC 2006;88:1204-6. [PMID: 16943473 DOI: 10.1302/0301-620x.88b9.17954] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

White RE, Thornhill S, Hampson E. Entrepreneurs and evolutionary biology: The relationship between testosterone and new venture creation. ORGANIZATIONAL BEHAVIOR AND HUMAN DECISION PROCESSES 2006. [DOI: 10.1016/j.obhdp.2005.11.001] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Loken E, Rovine MJ. Peirce's 19th Century Mixture Model Approach to Rater Agreement. AM STAT 2006. [DOI: 10.1198/000313006x108314] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Nam JM. Comparison of validity of assessment methods using indices of adjusted agreement. Stat Med 2006;26:620-32. [PMID: 16612836 DOI: 10.1002/sim.2562] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]