1
|
Li M, Gao Q, Yang J, Yu T. Evaluating inter-rater reliability in the context of "Sysmex UN2000 detection of protein/creatinine ratio and of renal tubular epithelial cells can be used for screening lupus nephritis": a statistical examination. BMC Nephrol 2024; 25:94. [PMID: 38481181 PMCID: PMC10938658 DOI: 10.1186/s12882-024-03540-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND The evaluation of inter-rater reliability (IRR) is integral to research designs involving the assessment of observational ratings by two raters. However, existing literature is often heterogeneous in reporting statistical procedures and the evaluation of IRR, although such information can impact subsequent hypothesis testing analyses. METHODS This paper evaluates a recent publication by Chen et al., featured in BMC Nephrology, aiming to introduce an alternative statistical approach to assessing IRR and discuss its statistical properties. The study underscores the crucial need for selecting appropriate Kappa statistics, emphasizing the accurate computation, interpretation, and reporting of commonly used IRR statistics between two raters. RESULTS The Cohen's Kappa statistic is typically used for two raters dealing with two categories or for unordered categorical variables having three or more categories. On the other hand, when assessing the concordance between two raters for ordered categorical variables with three or more categories, the commonly employed measure is the weighted Kappa. CONCLUSION Chen and colleagues might have underestimated the agreement between AU5800 and UN2000. Although the statistical approach adopted in Chen et al.'s research did not alter their findings, it is important to underscore the importance of researchers being discerning in their choice of statistical techniques to address their specific research inquiries.
Collapse
Affiliation(s)
- Ming Li
- Department of Software Engineering, College of Computer Science and Technology, Harbin Engineering University, 150001, Harbin, China
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, 161006, Qiqihar, China
| | - Qian Gao
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, 161006, Qiqihar, China
| | - Jing Yang
- Department of Software Engineering, College of Computer Science and Technology, Harbin Engineering University, 150001, Harbin, China.
| | - Tianfei Yu
- Heilongjiang Provincial Key Laboratory of Resistance Gene Engineering and Protection of Bioaffiliationersity in Cold Areas, Qiqihar University, 161006, Qiqihar, China.
- Department of Biotechnology, College of Life Science and Agriculture Forestry, Qiqihar University, 161006, Qiqihar, China.
| |
Collapse
|
2
|
Li M, Gao Q, Yu T. Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters. BMC Cancer 2023; 23:799. [PMID: 37626309 PMCID: PMC10464133 DOI: 10.1186/s12885-023-11325-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 08/22/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND In research designs that rely on observational ratings provided by two raters, assessing inter-rater reliability (IRR) is a frequently required task. However, some studies fall short in properly utilizing statistical procedures, omitting essential information necessary for interpreting their findings, or inadequately addressing the impact of IRR on subsequent analyses' statistical power for hypothesis testing. METHODS This article delves into the recent publication by Liu et al. in BMC Cancer, analyzing the controversy surrounding the Kappa statistic and methodological issues concerning the assessment of IRR. The primary focus is on the appropriate selection of Kappa statistics, as well as the computation, interpretation, and reporting of two frequently used IRR statistics when there are two raters involved. RESULTS The Cohen's Kappa statistic is typically utilized to assess the level of agreement between two raters when there are two categories or for unordered categorical variables with three or more categories. On the other hand, when it comes to evaluating the degree of agreement between two raters for ordered categorical variables comprising three or more categories, the weighted Kappa is a widely used measure. CONCLUSION Despite not substantially affecting the findings of Liu et al.?s study, the statistical dispute underscores the significance of employing suitable statistical methods. Rigorous and accurate statistical results are crucial for producing trustworthy research.
Collapse
Affiliation(s)
- Ming Li
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006, China
| | - Qian Gao
- Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006, China
| | - Tianfei Yu
- Department of Biotechnology, College of Life Science and Agriculture Forestry, Qiqihar University, Qiqihar, 161006, China.
| |
Collapse
|
3
|
Bagatto D, Tereshko Y, Piccolo D, Fabbro S, De Colle MC, Morassi M, Belgrado E, Lettieri C, Gigli GL, Valente M, Skrap M, D'Agostini S, Tuniz F. Clinical applicability of arterial spin labeling magnetic resonance imaging in patients with possible idiopathic normal pressure hydrocephalus: A prospective preliminary study. Clin Neurol Neurosurg 2023; 227:107645. [PMID: 36871390 DOI: 10.1016/j.clineuro.2023.107645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 02/17/2023] [Accepted: 02/18/2023] [Indexed: 02/25/2023]
Abstract
PURPOSE idiopathic Normal Pressure Hydrocephalus (iNPH) patients have a global reduction of cerebral blood flow (CBF) and Arterial Spin Label (ASL) MRI allows a global evaluation of CBF without the injection of contrast agents. This work aims to assess the qualitative evaluation agreement of ASL CBF colored maps between different neuroradiologists and by correlating these data to the Tap Test. METHODS Thirty - seven patients with the diagnosis of possible iNPH were consecutively submitted to a diagnostic MRI on a 1.5 Tesla Magnet before and after the lumbar infusion test and the Tap Test. Twenty - seven patients improved after the Tap Test and were addressed to surgery while 10 patients did not improve. All the MRI examinations included a 3D-Pulsed ASL sequence. Two different neuroradiologists independently reviewed all ASL images. They were asked to give a score (0 not improved; 1 improved) to global perfusion image quality by comparing ASL images obtained after the Tap Test to those obtained before. Comparison between inter- and intra-reader qualitative scores were performed with Cohen's kappa. RESULTS Inter-reader agreement between the two neuroradiologists showed that qualitative scores were attributed similarly by two readers (k = 0.83). This technique has a good PPV (90.5 %; CI 95 %, 72.7-97.1 %), NPV (50 %; CI 95 %, 34.1-65.6 %), SN (70.37 %; CI 95 %, 49.8-86.2 %) SP (80 %; CI 95 %, 44.4-97.5 %) and accuracy (73 %; CI 95 %, 55.9-86.2 %) when considered in the setting of possible iNPH patients. CONCLUSION ASL-MRI seems to be a promising non-invasive technique in the preoperative selection of patients affected by possible iNPH.
Collapse
Affiliation(s)
- Daniele Bagatto
- Neuroradiology Unit, Department of Diagnostic Imaging, University of Udine, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy.
| | - Yan Tereshko
- Clinical Neurology Unit, Udine University Hospital, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Daniele Piccolo
- Neurosurgery Unit, Department of Neurosciences, University of Udine, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy; Department of Clinical, Diagnostic and Pediatric Sciences, University of Pavia, Via Alessandro Brambilla, 74, 27100 Pavia, Italy
| | - Sara Fabbro
- Neurosurgery Unit, Department of Neurosciences, University of Udine, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Maria Cristina De Colle
- Neuroradiology Unit, Department of Diagnostic Imaging, University of Udine, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Mauro Morassi
- Neuroradiology Unit, Department of Diagnostic Imaging, Istituto Ospedaliero Fondazione Poliambulanza, Via Leonida Bissolati 57, 25124 Brescia, Italy
| | - Enrico Belgrado
- Neurology Unit, Department of Neurosciences, Udine University Hospital, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Christian Lettieri
- Neurosurgery Unit, Department of Neurosciences, University of Udine, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Gian Luigi Gigli
- Clinical Neurology Unit, Udine University Hospital, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Mariarosaria Valente
- Clinical Neurology Unit, Udine University Hospital, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Miran Skrap
- Neurosurgery Unit, Department of Neurosciences, University of Udine, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Serena D'Agostini
- Neuroradiology Unit, Department of Diagnostic Imaging, University of Udine, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| | - Francesco Tuniz
- Neurosurgery Unit, Department of Neurosciences, University of Udine, Piazzale Santa Maria della Misericordia 15, 33100 Udine, Italy
| |
Collapse
|
4
|
Abstract
BACKGROUND In inter-rater agreement studies, the assessment behaviour of raters can be influenced by their experience, training levels, the degree of willingness to take risks, and the availability of clear guidelines for the assessment. When the assessment behaviour of raters differentiates for some levels of an ordinal classification, a grey zone occurs between the corresponding adjacent cells to these levels around the main diagonal of the table. A grey zone introduces a negative bias to the estimate of the agreement level between the raters. In that sense, it is crucial to detect the existence of a grey zone in an agreement table. METHODS In this study, a framework composed of a metric and the corresponding threshold is developed to identify grey zones in an agreement table. The symmetry model and Cohen's kappa are used to define the metric, and the threshold is based on a nonlinear regression model. A numerical study is conducted to assess the accuracy of the developed framework. Real data examples are provided to illustrate the use of the metric and the impact of identifying a grey zone. RESULTS The sensitivity and specificity of the proposed framework are shown to be very high under moderate, substantial, and near-perfect agreement levels for [Formula: see text] and [Formula: see text] tables and sample sizes greater than or equal to 100 and 50, respectively. Real data examples demonstrate that when a grey zone is detected in the table, it is possible to report a notably higher level of agreement in the studies. CONCLUSIONS The accuracy of the proposed framework is sufficiently high; hence, it provides practitioners with a precise way to detect the grey zones in agreement tables.
Collapse
|
5
|
Friedman L, Prokopenko V, Djanian S, Katrychuk D, Komogortsev OV. Factors affecting inter-rater agreement in human classification of eye movements: a comparison of three datasets. Behav Res Methods 2023; 55:417-427. [PMID: 35411475 DOI: 10.3758/s13428-021-01782-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2021] [Indexed: 11/08/2022]
Abstract
Manual classification of eye-movements is used in research and as a basis for comparison with automatic algorithms in the development phase. However, human classification will not be useful if it is unreliable and unrepeatable. Therefore, it is important to know what factors might influence and enhance the accuracy and reliability of human classification of eye-movements. In this report we compare three datasets of human manual classification, two from earlier datasets and one, our own dataset, which we present here for the first time. For inter-rater reliability, we assess both the event-level F1-score and sample-level Cohen's κ, across groups of raters. The report points to several possible influences on human classification reliability: eye-tracker quality, use of head restraint, characteristics of the recorded subjects, the availability of detailed scoring rules, and the characteristics and training of the raters.
Collapse
Affiliation(s)
- Lee Friedman
- Derrick M5, Department of Computer Science, Texas State University, 601 University Drive, San Marcos, Texas, 78640, USA.
| | - Vladyslav Prokopenko
- Derrick M5, Department of Computer Science, Texas State University, 601 University Drive, San Marcos, Texas, 78640, USA
| | - Shagen Djanian
- Derrick M5, Department of Computer Science, Texas State University, 601 University Drive, San Marcos, Texas, 78640, USA
- Department of Computer Science, Aalborg University, Selma Lagerlofs Vej 300, 9220, Aalborg East, Denmark
| | - Dmytro Katrychuk
- Derrick M5, Department of Computer Science, Texas State University, 601 University Drive, San Marcos, Texas, 78640, USA
| | - Oleg V Komogortsev
- Derrick M5, Department of Computer Science, Texas State University, 601 University Drive, San Marcos, Texas, 78640, USA
| |
Collapse
|
6
|
Martínez-Bello DA, López-Quílez A, Torres Prieto A. Relative risk estimation of dengue disease at small spatial scale. Int J Health Geogr 2017; 16:31. [PMID: 28810908 PMCID: PMC5558735 DOI: 10.1186/s12942-017-0104-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2017] [Accepted: 08/05/2017] [Indexed: 11/29/2022] Open
Abstract
Background Dengue is a high incidence arboviral disease in tropical countries around the world. Colombia is an endemic country due to the favourable environmental conditions for vector survival and spread. Dengue surveillance in Colombia is based in passive notification of cases, supporting monitoring, prediction, risk factor identification and intervention measures. Even though the surveillance network works adequately, disease mapping techniques currently developed and employed for many health problems are not widely applied. We select the Colombian city of Bucaramanga to apply Bayesian areal disease mapping models, testing the challenges and difficulties of the approach. Methods We estimated the relative risk of dengue disease by census section (a geographical unit composed approximately by 1–20 city blocks) for the period January 2008 to December 2015. We included the covariates normalized difference vegetation index (NDVI) and land surface temperature (LST), obtained by satellite images. We fitted Bayesian areal models at the complete period and annual aggregation time scales for 2008–2015, with fixed and space-varying coefficients for the covariates, using Markov Chain Monte Carlo simulations. In addition, we used Cohen’s Kappa agreement measures to compare the risk from year to year, and from every year to the complete period aggregation. Results We found the NDVI providing more information than LST for estimating relative risk of dengue, although their effects were small. NDVI was directly associated to high relative risk of dengue. Risk maps of dengue were produced from the estimates obtained by the modeling process. The year to year risk agreement by census section was sligth to fair. Conclusion The study provides an example of implementation of relative risk estimation using Bayesian models for disease mapping at small spatial scale with covariates. We relate satellite data to dengue disease, using an areal data approach, which is not commonly found in the literature. The main difficulty of the study was to find quality data for generating expected values as input for the models. We remark the importance of creating population registry at small spatial scale, which is not only relevant for the risk estimation of dengue but also important to the surveillance of all notifiable diseases.
Collapse
Affiliation(s)
- Daniel Adyro Martínez-Bello
- Departament d'Estadística i Investigació Operativa, Facultat de Matemátiques, Universitat de València, C/Dr Moliner 50, 46100, Burjassot, Valencia, Spain.
| | - Antonio López-Quílez
- Departament d'Estadística i Investigació Operativa, Facultat de Matemátiques, Universitat de València, C/Dr Moliner 50, 46100, Burjassot, Valencia, Spain
| | - Alexander Torres Prieto
- Epidemiological Surveillance, Health Office of Department of Santander, Cl. 45 11-52, Bucaramanga, 680001, Colombia
| |
Collapse
|