Horsch K, Pesce LL, Giger ML, Metz CE, Jiang Y. A scaling transformation for classifier output based on likelihood ratio: applications to a CAD workstation for diagnosis of breast cancer.
Med Phys 2012;
39:2787-804. [PMID:
22559651 DOI:
10.1118/1.3700168]
[Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE
The authors developed scaling methods that monotonically transform the output of one classifier to the "scale" of another. Such transformations affect the distribution of classifier output while leaving the ROC curve unchanged. In particular, they investigated transformations between radiologists and computer classifiers, with the goal of addressing the problem of comparing and interpreting case-specific values of output from two classifiers.
METHODS
Using both simulated and radiologists' rating data of breast imaging cases, the authors investigated a likelihood-ratio-scaling transformation, based on "matching" classifier likelihood ratios. For comparison, three other scaling transformations were investigated that were based on matching classifier true positive fraction, false positive fraction, or cumulative distribution function, respectively. The authors explored modifying the computer output to reflect the scale of the radiologist, as well as modifying the radiologist's ratings to reflect the scale of the computer. They also evaluated how dataset size affects the transformations.
RESULTS
When ROC curves of two classifiers differed substantially, the four transformations were found to be quite different. The likelihood-ratio scaling transformation was found to vary widely from radiologist to radiologist. Similar results were found for the other transformations. Our simulations explored the effect of database sizes on the accuracy of the estimation of our scaling transformations.
CONCLUSIONS
The likelihood-ratio-scaling transformation that the authors have developed and evaluated was shown to be capable of transforming computer and radiologist outputs to a common scale reliably, thereby allowing the comparison of the computer and radiologist outputs on the basis of a clinically relevant statistic.
Collapse