1
|
Watanabe AT, Retson T, Wang J, Mantey R, Chim C, Karimabadi H. Mammographic Breast Density Model Using Semi-Supervised Learning Reduces Inter-/Intra- Reader Variability. Diagnostics (Basel) 2023; 13:2694. [PMID: 37627953 PMCID: PMC10453732 DOI: 10.3390/diagnostics13162694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 07/27/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open
Abstract
Breast density is an important risk factor for breast cancer development; however, imager inconsistency in density reporting can lead to patient and clinician confusion. A deep learning (DL) model for mammographic density grading was examined in a retrospective multi-reader multi-case study consisting of 928 image pairs and assessed for impact on inter- and intra-reader variability and reading time. Seven readers assigned density categories to the images, then re-read the test set aided by the model after a 4-week washout. To measure intra-reader agreement, 100 image pairs were blindly double read in both sessions. Linear Cohen Kappa (κ) and Student's t-test were used to assess the model and reader performance. The model achieved a κ of 0.87 (95% CI: 0.84, 0.89) for four-class density assessment and a κ of 0.91 (95% CI: 0.88, 0.93) for binary non-dense/dense assessment. Superiority tests showed significant reduction in inter-reader variability (κ improved from 0.70 to 0.88, p ≤ 0.001) and intra-reader variability (κ improved from 0.83 to 0.95, p ≤ 0.01) for four-class density, and significant reduction in inter-reader variability (κ improved from 0.77 to 0.96, p ≤ 0.001) and intra-reader variability (κ improved from 0.89 to 0.97, p ≤ 0.01) for binary non-dense/dense assessment when aided by DL. The average reader mean reading time per image pair also decreased by 30%, 0.86 s (95% CI: 0.01, 1.71), with six of seven readers having reading time reductions.
Collapse
Affiliation(s)
- Alyssa T. Watanabe
- Department of Radiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90007, USA
- CureMetrix, Inc., San Diego, CA 92101, USA (R.M.); (H.K.)
| | - Tara Retson
- Department of Radiology, University of California, San Diego, CA 92093, USA
| | - Junhao Wang
- CureMetrix, Inc., San Diego, CA 92101, USA (R.M.); (H.K.)
| | - Richard Mantey
- CureMetrix, Inc., San Diego, CA 92101, USA (R.M.); (H.K.)
| | - Chiyung Chim
- CureMetrix, Inc., San Diego, CA 92101, USA (R.M.); (H.K.)
| | | |
Collapse
|
2
|
Elfer K, Dudgeon S, Garcia V, Blenman K, Hytopoulos E, Wen S, Li X, Ly A, Werness B, Sheth MS, Amgad M, Gupta R, Saltz J, Hanna MG, Ehinger A, Peeters D, Salgado R, Gallas BD. Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms. J Med Imaging (Bellingham) 2022; 9:047501. [PMID: 35911208 PMCID: PMC9326105 DOI: 10.1117/1.jmi.9.4.047501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 06/28/2022] [Indexed: 11/14/2022] Open
Abstract
Purpose: Validation of artificial intelligence (AI) algorithms in digital pathology with a reference standard is necessary before widespread clinical use, but few examples focus on creating a reference standard based on pathologist annotations. This work assesses the results of a pilot study that collects density estimates of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer biopsy specimens. This work will inform the creation of a validation dataset for the evaluation of AI algorithms fit for a regulatory purpose. Approach: Collaborators and crowdsourced pathologists contributed glass slides, digital images, and annotations. Here, "annotations" refer to any marks, segmentations, measurements, or labels a pathologist adds to a report, image, region of interest (ROI), or biological feature. Pathologists estimated sTILs density in 640 ROIs from hematoxylin and eosin stained slides of 64 patients via two modalities: an optical light microscope and two digital image viewing platforms. Results: The pilot study generated 7373 sTILs density estimates from 29 pathologists. Analysis of annotations found the variability of density estimates per ROI increases with the mean; the root mean square differences were 4.46, 14.25, and 26.25 as the mean density ranged from 0% to 10%, 11% to 40%, and 41% to 100%, respectively. The pilot study informs three areas of improvement for future work: technical workflows, annotation platforms, and agreement analysis methods. Upgrades to the workflows and platforms will improve operability and increase annotation speed and consistency. Conclusions: Exploratory data analysis demonstrates the need to develop new statistical approaches for agreement. The pilot study dataset and analysis methods are publicly available to allow community feedback. The development and results of the validation dataset will be publicly available to serve as an instructive tool that can be replicated by developers and researchers.
Collapse
Affiliation(s)
- Katherine Elfer
- United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States
- National Institutes of Health, National Cancer Institute, Division of Cancer Prevention, Cancer Prevention Fellowship Program, Bethesda, Maryland, United States
| | - Sarah Dudgeon
- Yale University Computational Biology and Bioinformatics, New Haven, Connecticut, United States
- Yale New Haven Hospital, Center for Outcomes Research and Evaluation, New Haven, Connecticut, United States
| | - Victor Garcia
- United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States
| | - Kim Blenman
- School of Medicine, Yale Cancer Center, Department of Internal Medicine, Section of Medical Oncology, New Haven, Connecticut, United States
- Yale University, School of Engineering and Applied Science, Department of Computer Science, New Haven, Connecticut, United States
| | | | - Si Wen
- United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States
| | - Xiaoxian Li
- Emory University School of Medicine, Department of Pathology and Laboratory Medicine, Atlanta, Georgia, United States
| | - Amy Ly
- Massachusetts General Hospital, Boston, Massachusetts, United States
| | - Bruce Werness
- Inova Health System Department of Pathology, Falls Church, Virginia, United States
- Arrive Bio LLC, San Francisco, California, United States
| | - Manasi S. Sheth
- United States Food and Drug Administration (FDA), Center for Devices and Radiologic Health, Office of Product Evaluation and Quality, Office of Clinical Evidence and Analysis, Division of Biostatistics, White Oak, Maryland, United States
| | - Mohamed Amgad
- Northwestern University Feinberg School of Medicine, Department of Pathology, Chicago, Illinois, United States
| | - Rajarsi Gupta
- SUNY Stony Brook Medicine, Department of Biomedical Informatics, Stony Brook, New York, United States
| | - Joel Saltz
- SUNY Stony Brook Medicine, Department of Biomedical Informatics, Stony Brook, New York, United States
- SUNY Stony Brook Medicine, Department of Pathology, Stony Brook, New York, United States
| | - Matthew G. Hanna
- Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Anna Ehinger
- Lund University, Laboratory Medicine, Region Skåne, Department of Genetics and Pathology, Lund, Sweden
| | - Dieter Peeters
- Sint-Maarten Hospital, Department of Pathology, Mechelen, Belgium
- University of Antwerp, Department of Biomedical Sciences, Antwerp, Belgium
| | - Roberto Salgado
- Peter Mac Callum Cancer Centre, Division of Research, Melbourne, Australia
- GZA-ZNA Hospitals, Department of Pathology, Antwerp, Belgium
| | - Brandon D. Gallas
- United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States
- Address all correspondence to Brandon D. Gallas,
| |
Collapse
|
3
|
Hsieh SS, Inoue A, Pillai PS, Gong H, Holmes DR, Cook DA, Leng S, Yu L, Carter RE, Fletcher JG, McCollough CH. A 25-reader performance study for hepatic metastasis detection: lessons from unsupervised learning. Proc SPIE Int Soc Opt Eng 2022; 12031:1203116. [PMID: 35677469 PMCID: PMC9171749 DOI: 10.1117/12.2611543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
There is substantial variability in the performance of radiologist readers. We hypothesized that certain readers may have idiosyncratic weaknesses towards certain types of lesions, and unsupervised learning techniques might identify these patterns. After IRB approval, 25 radiologist readers (9 abdominal subspecialists and 16 non-specialists or trainees) read 40 portal phase liver CT exams, marking all metastases and providing a confidence rating on a scale of 1 to 100. We formed a matrix of reader confidence ratings, with rows corresponding to readers, and columns corresponding to metastases, and each matrix entry providing the confidence rating that a reader gave to the metastasis, with zero confidence used for lesions that were not marked. A clustergram was used to permute the rows and columns of this matrix to group similar readers and metastases together. This clustergram was manually interpreted. We found a cluster of lesions with atypical presentation that were missed by several readers, including subspecialists, and a separate cluster of small, subtle lesions where subspecialists were more confident of their diagnosis than trainees. These and other observations from unsupervised learning could inform targeted training and education of future radiologists.
Collapse
Affiliation(s)
- Scott S Hsieh
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | - Akitoshi Inoue
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | | | - Hao Gong
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | - David R Holmes
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | - David A Cook
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | - Shuai Leng
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | - Lifeng Yu
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | - Rickey E Carter
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | - Joel G Fletcher
- Department of Radiology, Mayo Clinic, Rochester, MN, 55901, USA
| | | |
Collapse
|
4
|
Erickson-Bhatt SJ, Simpson DG, Boppart SA. Statistical evaluation of reader variability in assessing the diagnostic performance of optical coherence tomography. J Biomed Opt 2020; 25:116002. [PMID: 33179459 PMCID: PMC7657413 DOI: 10.1117/1.jbo.25.11.116002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 10/19/2020] [Indexed: 06/11/2023]
Abstract
SIGNIFICANCE Optical coherence tomography (OCT) is widely used as a potential diagnostic tool for a variety of diseases including various types of cancer. However, sensitivity and specificity analyses of OCT in different cancers yield results varying from 11% to 100%. Hence, there is a need for more detailed statistical analysis of blinded reader studies. AIM Extensive statistical analysis is performed on results from a blinded study involving OCT of breast tumor margins to assess the impact of reader variability on sensitivity and specificity. APPROACH Five readers with varying levels of experience reading OCT images assessed 50 OCT images of breast tumor margins collected using an intraoperative OCT system. Statistical modeling and analysis was performed using the R language to analyze reader experience and variability. RESULTS Statistical analysis showed that the readers' prior experience with OCT images was directly related to the probability of the readers' assessment agreeing with histology. Additionally, results from readers with prior experience specific to OCT in breast cancer had a higher probability of agreement with histology compared to readers with experience with OCT in other (noncancer) diseases. CONCLUSIONS The results from this study demonstrate the potential impact of reader training and experience in the assessment of sensitivity and specificity. They also demonstrate even greater potential improvement in diagnostic performance by combining results from multiple readers. These preliminary findings suggest valuable directions for further study.
Collapse
Affiliation(s)
- Sarah J. Erickson-Bhatt
- University of Illinois at Urbana-Champaign, Beckman Institute for Advanced Science and Technology, Urbana, Illinois, United States
| | - Douglas G. Simpson
- University of Illinois at Urbana-Champaign, Beckman Institute for Advanced Science and Technology, Urbana, Illinois, United States
- University of Illinois at Urbana-Champaign, Department of Statistics, Champaign, Illinois, United States
| | - Stephen A. Boppart
- University of Illinois at Urbana-Champaign, Beckman Institute for Advanced Science and Technology, Urbana, Illinois, United States
- University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, Urbana, Illinois, United States
- University of Illinois at Urbana-Champaign, Department of Bioengineering, Urbana, Illinois, United States
- University of Illinois at Urbana-Champaign, Carle Illinois College of Medicine, Champaign, Illinois, United States
| |
Collapse
|
5
|
Han EJ, O JH, Yoon H, Jung SE, Park G, Choi BO, Cho SG. FDG PET/CT response in diffuse large B-cell lymphoma: Reader variability and association with clinical outcome. Medicine (Baltimore) 2016; 95:e4983. [PMID: 27684851 PMCID: PMC5265944 DOI: 10.1097/md.0000000000004983] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 08/04/2016] [Accepted: 09/04/2016] [Indexed: 12/30/2022] Open
Abstract
F-18-fluoro-2-deoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) is essential for monitoring response to treatment in patients with diffuse large B-cell lymphoma (DLBCL) and qualitative interpretation is commonly applied in clinical practice. We aimed to evaluate the interobserver agreements of qualitative PET/CT response in patients with DLBCL and the predictive value of PET/CT results for clinical outcome.PET/CT images were obtained for patients with DLBCL 3 times: at baseline, after 3 cycles of first-line chemotherapy (interim), and after completion of chemotherapy. Two nuclear medicine physicians (with 3 and 8 years of experience with PET/CT) retrospectively assessed response to chemotherapy blinded to the clinical outcome using International Harmonization Project (IHP) criteria and Deauville 5-point score. The associations between PET/CT results and progression-free survival (PFS) and overall survival (OS) were assessed using Cox regression analysis.A total of 112 PET/CT images were included from 59 patients with DLBCL (36 male, 23 female; mean age 53 ± 14 years). Using the IHP criteria, interobserver agreement was substantial (Cohen κ = 0.76) with absolute agreement consistency of 89%. Using the Deauville score, interobserver agreement was moderate (Cohen weighted κ = 0.54) and absolute consistency was 62%. The most common cause of disagreements was discordant interpretation of residual tumor uptake. With median follow-up period of 60 months, estimated 5-year PFS and OS were 81% and 92%, respectively. Neither interim nor posttreatment PET/CT results by both readers were significantly associated with PFS. Interim PET/CT result by the more experienced reader using Deauville score was a significant factor for OS (P = 0.019).Moderate-to-substantial interobserver agreement was observed for response assessments according to qualitative PET/CT criteria, and interim PET/CT result could predict OS in patients with DLBCL. Further studies are necessary to further standardize the PET/CT-based response criteria for more consistent interpretation.
Collapse
Affiliation(s)
- Eun Ji Han
- Department of Radiology, Daejeon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Daejeon
| | | | | | | | | | | | - Seok-Goo Cho
- Department of Hematology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| |
Collapse
|
6
|
Kang E, Lee EJ, Jang M, Kim SM, Kim Y, Chun M, Tai JH, Han W, Kim SW, Kim JH. Reliability of Computer-Assisted Breast Density Estimation: Comparison of Interactive Thresholding, Semiautomated, and Fully Automated Methods. AJR Am J Roentgenol 2016; 207:126-34. [PMID: 27187523 DOI: 10.2214/AJR.15.15469] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
OBJECTIVE The purpose of this study was to investigate the reliability of computer-assisted methods of estimating breast density. MATERIALS AND METHODS Craniocaudal mammograms of 100 healthy subjects were collected from a screening mammography database. Three expert readers independently assessed mammographic breast density twice in a 1-month period using interactive thresholding and semiautomated methods. In addition, fully automated breast density estimation software was used to generate objective breast density estimates. The reliability of the computer-assisted breast density estimation was assessed in terms of concordance correlation coefficients, limits of agreement, systematic difference, and reader variability. RESULTS Statistically significant systematic bias (paired t test, p < 0.01) and variability (4.75-10.91) were found within and between readers for both the interactive thresholding and the semiautomated methods. Using the semiautomated method significantly reduced the within-reader bias of one reader (p < 0.02) and the between-reader variability of all three readers (p < 0.05). The breast density estimates obtained with the fully automated method had excellent agreement with those of the reference standard (concordance correlation coefficient, 0.93) without a significant systematic difference. CONCLUSION Reader-dependent variability and systematic bias exist in breast density estimates obtained with the interactive thresholding method, but they may be reduced in part by use of the semiautomated method. Assessing reader performance may be necessary for more reliable breast density estimation, especially for surveillance of breast density over time. The fully automated method has the potential to provide reliable breast density estimates nearly free from reader-dependent systematic bias and reader variability.
Collapse
|
7
|
Abdolell M, Tsuruda K, Lightfoot CB, Barkova E, McQuaid M, Caines J, Iles SE. Consistency of visual assessments of mammographic breast density from vendor-specific "for presentation" images. J Med Imaging (Bellingham) 2016; 3:011004. [PMID: 26870747 DOI: 10.1117/1.jmi.3.1.011004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 09/23/2015] [Indexed: 11/14/2022] Open
Abstract
Discussions of percent breast density (PD) and breast cancer risk implicitly assume that visual assessments of PD are comparable between vendors despite differences in technology and display algorithms. This study examines the extent to which visual assessments of PD differ between mammograms acquired from two vendors. Pairs of "for presentation" digital mammography images were obtained from two mammography units for 146 women who had a screening mammogram on one vendor unit followed by a diagnostic mammogram on a different vendor unit. Four radiologists independently visually assessed PD from single left mediolateral oblique view images from the two vendors. Analysis of variance, intra-class correlation coefficients (ICC), scatter plots, and Bland-Altman plots were used to evaluate PD assessments between vendors. The mean radiologist PD for each image was used as a consensus PD measure. Overall agreement of the PD assessments was excellent between the two vendors with an ICC of 0.95 (95% confidence interval: 0.93 to 0.97). Bland-Altman plots demonstrated narrow upper and lower limits of agreement between the vendors with only a small bias (2.3 percentage points). The results of this study support the assumption that visual assessment of PD is consistent across mammography vendors despite vendor-specific appearances of "for presentation" images.
Collapse
Affiliation(s)
- Mohamed Abdolell
- Dalhousie University, Faculty of Medicine, Department of Diagnostic Radiology, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada; Nova Scotia Health Authority, Department of Diagnostic Imaging, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada
| | - Kaitlyn Tsuruda
- Nova Scotia Health Authority , Department of Diagnostic Imaging, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada
| | - Christopher B Lightfoot
- Dalhousie University, Faculty of Medicine, Department of Diagnostic Radiology, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada; Nova Scotia Health Authority, Department of Diagnostic Imaging, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada
| | - Eva Barkova
- South Shore Regional Hospital , Department of Diagnostic Imaging, 90 Glen Allan Drive, Bridgewater, Nova Scotia B4V 3S6, Canada
| | - Melanie McQuaid
- Queen Elizabeth Hospital , Department of Diagnostic Imaging, 60 Riverside Drive, PO Box 6600, Charlottetown, Prince Edward Island C1A 8T5, Canada
| | - Judy Caines
- Dalhousie University, Faculty of Medicine, Department of Diagnostic Radiology, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada; Nova Scotia Health Authority, Department of Diagnostic Imaging, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada; Nova Scotia Breast Screening Program, 603L-7001 Mumford Road, Halifax, Nova Scotia B3L 2H8, Canada
| | - Sian E Iles
- Dalhousie University, Faculty of Medicine, Department of Diagnostic Radiology, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada; Nova Scotia Health Authority, Department of Diagnostic Imaging, 1276 South Park Street, Room 3212, Dickson Building, Halifax, Nova Scotia B3H 2Y9, Canada
| |
Collapse
|
8
|
Schalekamp S, van Ginneken B, Schaefer-Prokop CM, Karssemeijer N. Influence of study design in receiver operating characteristics studies: sequential versus independent reading. J Med Imaging (Bellingham) 2014; 1:015501. [PMID: 26158028 DOI: 10.1117/1.jmi.1.1.015501] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 01/28/2014] [Accepted: 01/29/2014] [Indexed: 11/14/2022] Open
Abstract
Observer studies to assess new image processing devices or computer-aided diagnosis techniques are often performed, but little is known about the effect of the study design on observer performance results. We investigated the effect of the sequential and independent reading design on observer study results with respect to reader performance and their statistical power. For this we performed an observer study for the detection of lung nodules with bone-suppressed images (BSIs) compared with original chest radiographs. In a fully crossed observer study, eight observers assessed a series of 300 radiographs four times, including one assessment of the original radiograph with sequential BSI and two independent reading sessions with BSI. Observer performance was compared using multireader multicase receiver operating characteristics. No significant difference between the effect of BSI in the sequential and the independent reading sessions could be found ([Formula: see text]; [Formula: see text]). Compared with the original radiographs, increased performance with BSI was significant in the sequential and one of the independent reading sessions ([Formula: see text]; [Formula: see text]), and nonsignificant in the other independent reading session ([Formula: see text]). A strong increase of uncorrelated variance components was found in the independent reading sessions, masking the ability to demonstrate differences in observer performance across modalities. Therefore, the sequential reading design is the preferred design because it is less burdensome and has more statistical power.
Collapse
Affiliation(s)
- Steven Schalekamp
- Radboud University , Nijmegen, Medical Center, Department of Radiology, Postbus 9101, 6500 HB Nijmegen, The Netherlands
| | - Bram van Ginneken
- Radboud University , Nijmegen, Medical Center, Department of Radiology, Postbus 9101, 6500 HB Nijmegen, The Netherlands
| | - Cornelia M Schaefer-Prokop
- Radboud University , Nijmegen, Medical Center, Department of Radiology, Postbus 9101, 6500 HB Nijmegen, The Netherlands ; Meander Medical Center , Department of Radiology, Postbus 1502, 3800 BM Amersfoort, The Netherlands
| | - Nico Karssemeijer
- Radboud University , Nijmegen, Medical Center, Department of Radiology, Postbus 9101, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
9
|
Salvi V, Karnad DR, Kerkar V, Panicker GK, Natekar M, Kothari S. Comparison of two methods of estimating reader variability in QT interval measurements in thorough QT/QTc studies. Ann Noninvasive Electrocardiol 2014; 19:182-9. [PMID: 24521536 DOI: 10.1111/anec.12136] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Two methods of estimating reader variability (RV) in QT measurements between 12 readers were compared. METHODS Using data from 500 electrocardiograms (ECGs) analyzed twice by 12 readers, we bootstrapped 1000 datasets each for both methods. In grouped analysis design (GAD), the same 40 ECGs were read twice by all readers. In pairwise analysis design (PAD), 40 ECGs analyzed by each reader in a clinical trial were reanalyzed by the same reader (intra-RV) and also by another reader (inter-RV); thus, variability between each pair of readers was estimated using different ECGs. RESULTS Inter-RV (mean [95% CI]) between pairs of readers by GAD and PAD was 3.9 ms (2.1-5.5 ms) and 4.1 ms (2.6-5.4 ms), respectively, using ANOVA, 0 ms (-0.0 to 0.4 ms), and 0 ms (-0.7 to 0.6 ms), respectively, by actual difference between readers and 7.7 ms (6.2-9.8 ms) and 7.7 ms (6.6-9.1 ms), respectively, by absolute difference between readers. Intra-RV too was comparable. CONCLUSIONS RV estimates by the grouped- and pairwise analysis designs are comparable.
Collapse
Affiliation(s)
- Vaibhav Salvi
- Research Section, Quintiles Cardiac Safety Services, Mumbai, India
| | | | | | | | | | | |
Collapse
|