1
|
Makeev A, Li K, Anastasio MA, Emig A, Jahnke P, Glick SJ. Automated assessment of task-based performance of digital mammography and tomosynthesis systems using an anthropomorphic breast phantom and deep learning-based scoring. J Med Imaging (Bellingham) 2025; 12:S13005. [PMID: 39416764 PMCID: PMC11474246 DOI: 10.1117/1.jmi.12.s1.s13005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 09/12/2024] [Accepted: 09/23/2024] [Indexed: 10/19/2024] Open
Abstract
Purpose Conventional metrics used for assessing digital mammography (DM) and digital breast tomosynthesis (DBT) image quality, including noise, spatial resolution, and detective quantum efficiency, do not necessarily predict how well the system will perform in a clinical task. A number of existing phantom-based methods have their own limitations, such as unrealistic uniform backgrounds, subjective scoring using humans, and regular signal patterns unrepresentative of common clinical findings. We attempted to address this problem with a realistic breast phantom with random hydroxyapatite microcalcifications and semi-automated deep learning-based image scoring. Our goal was to develop a methodology for objective task-based assessment of image quality for tomosynthesis and DM systems, which includes an anthropomorphic phantom, a detection task (microcalcification clusters), and automated performance evaluation using a convolutional neural network. Approach Experimental 2D and pseudo-3D mammograms of an anthropomorphic inkjet-printed breast phantom with inserted microcalcification clusters were collected on clinical mammography systems to train a signal-present/signal-absent image classifier based on Resnet-18 architecture. In a separate validation study using simulations, this Resnet-18 classifier was shown to approach the performance of an ideal observer. Microcalcification detection performance was evaluated as a function of four dose levels using receiver operating characteristic (ROC) analysis [i.e., area under the ROC curve (AUC)]. To demonstrate the use of this evaluation approach for assessing different technologies, the method was applied to two different mammography systems, as well as to mammograms with re-binned pixels emulating a lower-resolution X-ray detector. Results Microcalcification detectability, as assessed by the deep learning classifier, was observed to vary with the exposure incident on the breast phantom for both DM and tomosynthesis. At full dose, experimental AUC was 0.96 (for DM) and 0.95 (for DBT), whereas at half dose, it dropped to 0.85 and 0.71, respectively. AUC performance on DM was significantly decreased with an effective larger pixel size obtained with re-binning. The task-based assessment approach also showed the superiority of a newer mammography system compared with an older system. Conclusions An objective task-based methodology for assessing the image quality of mammography and tomosynthesis systems is proposed. Possible uses for this tool could be quality control, acceptance, and constancy testing, assessing the safety and effectiveness of new technology for regulatory submissions, and system optimization. The results from this study showed that the proposed evaluation method using a deep learning model observer can track differences in microcalcification signal detectability with varied exposure conditions.
Collapse
Affiliation(s)
- Andrey Makeev
- U.S. Food & Drug Administration, Silver Spring, Maryland, United States
| | - Kaiyan Li
- University of Illinois Urbana-Champaign, Urbana, Illinois, United States
| | - Mark A. Anastasio
- University of Illinois Urbana-Champaign, Urbana, Illinois, United States
| | - Arthur Emig
- Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Department of Radiology, Berlin, Germany
| | - Paul Jahnke
- Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Department of Radiology, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Stephen J. Glick
- U.S. Food & Drug Administration, Silver Spring, Maryland, United States
| |
Collapse
|
2
|
Shao M, Byrd DW, Mitra J, Behnia F, Lee JH, Iravani A, Sadic M, Chen DL, Wollenweber SD, Abbey CK, Kinahan PE, Ahn S. A deep learning anthropomorphic model observer for a detection task in PET. Med Phys 2024; 51:7093-7107. [PMID: 39008812 PMCID: PMC11725380 DOI: 10.1002/mp.17303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/15/2024] [Accepted: 06/24/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Lesion detection is one of the most important clinical tasks in positron emission tomography (PET) for oncology. An anthropomorphic model observer (MO) designed to replicate human observers (HOs) in a detection task is an important tool for assessing task-based image quality. The channelized Hotelling observer (CHO) has been the most popular anthropomorphic MO. Recently, deep learning MOs (DLMOs), mostly based on convolutional neural networks (CNNs), have been investigated for various imaging modalities. However, there have been few studies on DLMOs for PET. PURPOSE The goal of the study is to investigate whether DLMOs can predict HOs better than conventional MOs such as CHO in a two-alternative forced-choice (2AFC) detection task using PET images with real anatomical variability. METHODS Two types of DLMOs were implemented: (1) CNN DLMO, and (2) CNN-SwinT DLMO that combines CNN and Swin Transformer (SwinT) encoders. Lesion-absent PET images were reconstructed from clinical data, and lesion-present images were reconstructed with adding simulated lesion sinogram data. Lesion-present and lesion-absent PET image pairs were labeled by eight HOs consisting of four radiologists and four image scientists in a 2AFC detection task. In total, 2268 pairs of lesion-present and lesion-absent images were used for training, 324 pairs for validation, and 324 pairs for test. CNN DLMO, CNN-SwinT DLMO, CHO with internal noise, and non-prewhitening matched filter (NPWMF) were compared in the same train-test paradigm. For comparison, six quantitative metrics including prediction accuracy, mean squared errors (MSEs) and correlation coefficients, which measure how well a MO predicts HOs, were calculated in a 9-fold cross-validation experiment. RESULTS In terms of the accuracy and MSE metrics, CNN DLMO and CNN-SwinT DLMO showed better performance than CHO and NPWMF, and CNN-SwinT DLMO showed the best performance among the MOs evaluated. CONCLUSIONS DLMO can predict HOs more accurately than conventional MOs such as CHO in PET lesion detection. Combining SwinT and CNN encoders can improve the DLMO prediction performance compared to using CNN only.
Collapse
Affiliation(s)
- Muhan Shao
- GE HealthCare Technology & Innovation Center, Niskayuna, New Yok 12309, USA
| | - Darrin W. Byrd
- Department of Radiology, University of Washington, Seattle, Washington 98195, USA
| | - Jhimli Mitra
- GE HealthCare Technology & Innovation Center, Niskayuna, New Yok 12309, USA
| | - Fatemeh Behnia
- Department of Radiology, University of Washington, Seattle, Washington 98195, USA
| | - Jean H. Lee
- Department of Radiology, University of Washington, Seattle, Washington 98195, USA
| | - Amir Iravani
- Department of Radiology, University of Washington, Seattle, Washington 98195, USA
| | - Murat Sadic
- Department of Radiology, University of Washington, Seattle, Washington 98195, USA
| | - Delphine L. Chen
- Department of Radiology, University of Washington, Seattle, Washington 98195, USA
| | | | - Craig K. Abbey
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California 93106, USA
| | - Paul E. Kinahan
- Department of Radiology, University of Washington, Seattle, Washington 98195, USA
| | - Sangtae Ahn
- GE HealthCare Technology & Innovation Center, Niskayuna, New Yok 12309, USA
| |
Collapse
|
3
|
Li K, Li H, Anastasio MA. Investigating the use of signal detection information in supervised learning-based image denoising with consideration of task-shift. J Med Imaging (Bellingham) 2024; 11:055501. [PMID: 39247217 PMCID: PMC11376226 DOI: 10.1117/1.jmi.11.5.055501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 07/26/2024] [Accepted: 08/09/2024] [Indexed: 09/10/2024] Open
Abstract
Purpose Recently, learning-based denoising methods that incorporate task-relevant information into the training procedure have been developed to enhance the utility of the denoised images. However, this line of research is relatively new and underdeveloped, and some fundamental issues remain unexplored. Our purpose is to yield insights into general issues related to these task-informed methods. This includes understanding the impact of denoising on objective measures of image quality (IQ) when the specified task at inference time is different from that employed for model training, a phenomenon we refer to as "task-shift." Approach A virtual imaging test bed comprising a stylized computational model of a chest X-ray computed tomography imaging system was employed to enable a controlled and tractable study design. A canonical, fully supervised, convolutional neural network-based denoising method was purposely adopted to understand the underlying issues that may be relevant to a variety of applications and more advanced denoising or image reconstruction methods. Signal detection and signal detection-localization tasks under signal-known-statistically with background-known-statistically conditions were considered, and several distinct types of numerical observers were employed to compute estimates of the task performance. Studies were designed to reveal how a task-informed transfer-learning approach can influence the tradeoff between conventional and task-based measures of image quality within the context of the considered tasks. In addition, the impact of task-shift on these image quality measures was assessed. Results The results indicated that certain tradeoffs can be achieved such that the resulting AUC value was significantly improved and the degradation of physical IQ measures was statistically insignificant. It was also observed that introducing task-shift degrades the task performance as expected. The degradation was significant when a relatively simple task was considered for network training and observer performance on a more complex one was assessed at inference time. Conclusions The presented results indicate that the task-informed training method can improve the observer performance while providing control over the tradeoff between traditional and task-based measures of image quality. The behavior of a task-informed model fine-tuning procedure was demonstrated, and the impact of task-shift on task-based image quality measures was investigated.
Collapse
Affiliation(s)
- Kaiyan Li
- University of Illinois Urbana-Champaign, Department of Bioengineering, Urbana, Illinois, United States
| | - Hua Li
- University of Illinois Urbana-Champaign, Department of Bioengineering, Urbana, Illinois, United States
- Washington University School of Medicine in St. Louis, Department of Radiation Oncology, Saint Louis, Missouri, United States
| | - Mark A. Anastasio
- University of Illinois Urbana-Champaign, Department of Bioengineering, Urbana, Illinois, United States
| |
Collapse
|
4
|
Herman JD, Roca RE, O’Neill AG, Wong ML, Goud Lingala S, Pineda AR. Task-based assessment for neural networks: evaluating undersampled MRI reconstructions based on human observer signal detection. J Med Imaging (Bellingham) 2024; 11:045503. [PMID: 39144582 PMCID: PMC11321363 DOI: 10.1117/1.jmi.11.4.045503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 06/17/2024] [Accepted: 07/22/2024] [Indexed: 08/16/2024] Open
Abstract
Purpose Recent research explores using neural networks to reconstruct undersampled magnetic resonance imaging. Because of the complexity of the artifacts in the reconstructed images, there is a need to develop task-based approaches to image quality. We compared conventional global quantitative metrics to evaluate image quality in undersampled images generated by a neural network with human observer performance in a detection task. The purpose is to study which acceleration (2×, 3×, 4×, 5×) would be chosen with the conventional metrics and compare it to the acceleration chosen by human observer performance. Approach We used common global metrics for evaluating image quality: the normalized root mean squared error (NRMSE) and structural similarity (SSIM). These metrics are compared with a measure of image quality that incorporates a subtle signal for a specific task to allow for image quality assessment that locally evaluates the effect of undersampling on a signal. We used a U-Net to reconstruct under-sampled images with 2×, 3×, 4×, and 5× one-dimensional undersampling rates. Cross-validation was performed for a 500- and a 4000-image training set with both SSIM and MSE losses. A two-alternative forced choice (2-AFC) observer study was carried out for detecting a subtle signal (small blurred disk) from images with the 4000-image training set. Results We found that for both loss functions, the human observer performance on the 2-AFC studies led to a choice of a 2× undersampling, but the SSIM and NRMSE led to a choice of a 3× undersampling. Conclusions For this detection task using a subtle small signal at the edge of detectability, SSIM and NRMSE led to an overestimate of the achievable undersampling using a U-Net before a steep loss of image quality between 2×, 3×, 4×, 5× undersampling rates when compared to the performance of human observers in the detection task.
Collapse
Affiliation(s)
- Joshua D. Herman
- Manhattan College, Department of Mathematics, The Bronx, New York, United States
| | - Rachel E. Roca
- Manhattan College, Department of Mathematics, The Bronx, New York, United States
| | - Alexandra G. O’Neill
- Manhattan College, Department of Mathematics, The Bronx, New York, United States
| | - Marcus L. Wong
- Manhattan College, Department of Mathematics, The Bronx, New York, United States
| | - Sajan Goud Lingala
- University of Iowa, Roy J. Carver Department of Biomedical Engineering, Iowa City, Iowa, United States
| | - Angel R. Pineda
- Manhattan College, Department of Mathematics, The Bronx, New York, United States
| |
Collapse
|
5
|
Zhou W, Villa U, Anastasio MA. Ideal Observer Computation by Use of Markov-Chain Monte Carlo With Generative Adversarial Networks. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3715-3724. [PMID: 37578916 PMCID: PMC10769588 DOI: 10.1109/tmi.2023.3304907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
Medical imaging systems are often evaluated and optimized via objective, or task-specific, measures of image quality (IQ) that quantify the performance of an observer on a specific clinically-relevant task. The performance of the Bayesian Ideal Observer (IO) sets an upper limit among all observers, numerical or human, and has been advocated for use as a figure-of-merit (FOM) for evaluating and optimizing medical imaging systems. However, the IO test statistic corresponds to the likelihood ratio that is intractable to compute in the majority of cases. A sampling-based method that employs Markov-chain Monte Carlo (MCMC) techniques was previously proposed to estimate the IO performance. However, current applications of MCMC methods for IO approximation have been limited to a small number of situations where the considered distribution of to-be-imaged objects can be described by a relatively simple stochastic object model (SOM). As such, there remains an important need to extend the domain of applicability of MCMC methods to address a large variety of scenarios where IO-based assessments are needed but the associated SOMs have not been available. In this study, a novel MCMC method that employs a generative adversarial network (GAN)-based SOM, referred to as MCMC-GAN, is described and evaluated. The MCMC-GAN method was quantitatively validated by use of test-cases for which reference solutions were available. The results demonstrate that the MCMC-GAN method can extend the domain of applicability of MCMC methods for conducting IO analyses of medical imaging systems.
Collapse
|
6
|
Patwari M, Gutjahr R, Marcus R, Thali Y, Calvarons AF, Raupach R, Maier A. Reducing the risk of hallucinations with interpretable deep learning models for low-dose CT denoising: comparative performance analysis. Phys Med Biol 2023; 68:19LT01. [PMID: 37733068 DOI: 10.1088/1361-6560/acfc11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 09/21/2023] [Indexed: 09/22/2023]
Abstract
Objective.Reducing CT radiation dose is an often proposed measure to enhance patient safety, which, however results in increased image noise, translating into degradation of clinical image quality. Several deep learning methods have been proposed for low-dose CT (LDCT) denoising. The high risks posed by possible hallucinations in clinical images necessitate methods which aid the interpretation of deep learning networks. In this study, we aim to use qualitative reader studies and quantitative radiomics studies to assess the perceived quality, signal preservation and statistical feature preservation of LDCT volumes denoised by deep learning. We aim to compare interpretable deep learning methods with classical deep neural networks in clinical denoising performance.Approach.We conducted an image quality analysis study to assess the image quality of the denoised volumes based on four criteria to assess the perceived image quality. We subsequently conduct a lesion detection/segmentation study to assess the impact of denoising on signal detectability. Finally, a radiomic analysis study was performed to observe the quantitative and statistical similarity of the denoised images to standard dose CT (SDCT) images.Main results.The use of specific deep learning based algorithms generate denoised volumes which are qualitatively inferior to SDCT volumes(p< 0.05). Contrary to previous literature, denoising the volumes did not reduce the accuracy of the segmentation (p> 0.05). The denoised volumes, in most cases, generated radiomics features which were statistically similar to those generated from SDCT volumes (p> 0.05).Significance.Our results show that the denoised volumes have a lower perceived quality than SDCT volumes. Noise and denoising do not significantly affect detectability of the abdominal lesions. Denoised volumes also contain statistically identical features to SDCT volumes.
Collapse
Affiliation(s)
- Mayank Patwari
- Pattern Recognition Lab, Friedrich-Alexander Universität Erlangen-Nürnberg, D-91058 Erlangen, Germany
- CT Concepts, Siemens Healthineers AG, D-91301 Forchheim, Germany
| | - Ralf Gutjahr
- CT Concepts, Siemens Healthineers AG, D-91301 Forchheim, Germany
| | - Roy Marcus
- Balgrist University Hospital Zurich, 8008 Zurich, Switzerland
- Faculty of Medicine, University of Zurich, 8032 Zurich, Switzerland
- Cantonal Hospital of Lucerne, 6016 Lucerne, Switzerland
| | - Yannick Thali
- Spital Zofingen AG, 4800 Zofingen, Switzerland
- Cantonal Hospital of Lucerne, 6016 Lucerne, Switzerland
| | | | - Rainer Raupach
- CT Concepts, Siemens Healthineers AG, D-91301 Forchheim, Germany
| | - Andreas Maier
- Pattern Recognition Lab, Friedrich-Alexander Universität Erlangen-Nürnberg, D-91058 Erlangen, Germany
| |
Collapse
|
7
|
Li K, Zhou W, Li H, Anastasio MA. A Hybrid Approach for Approximating the Ideal Observer for Joint Signal Detection and Estimation Tasks by Use of Supervised Learning and Markov-Chain Monte Carlo Methods. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:1114-1124. [PMID: 34898433 PMCID: PMC9128572 DOI: 10.1109/tmi.2021.3135147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The ideal observer (IO) sets an upper performance limit among all observers and has been advocated for assessing and optimizing imaging systems. For general joint detection and estimation (detection-estimation) tasks, estimation ROC (EROC) analysis has been established for evaluating the performance of observers. However, in general, it is difficult to accurately approximate the IO that maximizes the area under the EROC curve. In this study, a hybrid method that employs machine learning is proposed to accomplish this. Specifically, a hybrid approach is developed that combines a multi-task convolutional neural network and a Markov-Chain Monte Carlo (MCMC) method in order to approximate the IO for detection-estimation tasks. Unlike traditional MCMC methods, the hybrid method is not limited to use of specific utility functions. In addition, a purely supervised learning-based sub-ideal observer is proposed. Computer-simulation studies are conducted to validate the proposed method, which include signal-known-statistically/background-known-exactly and signal-known-statistically/background-known-statistically tasks. The EROC curves produced by the proposed method are compared to those produced by the MCMC approach or analytical computation when feasible. The proposed method provides a new approach for approximating the IO and may advance the application of EROC analysis for optimizing imaging systems.
Collapse
|
8
|
Zhou W, Bhadra S, Brooks FJ, Li H, Anastasio MA. Learning stochastic object models from medical imaging measurements by use of advanced ambient generative adversarial networks. J Med Imaging (Bellingham) 2022; 9:015503. [PMID: 35229009 PMCID: PMC8866417 DOI: 10.1117/1.jmi.9.1.015503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 02/07/2022] [Indexed: 11/14/2022] Open
Abstract
Purpose: To objectively assess new medical imaging technologies via computer-simulations, it is important to account for the variability in the ensemble of objects to be imaged. This source of variability can be described by stochastic object models (SOMs). It is generally desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system, but this task has remained challenging. Approach: A generative adversarial network (GAN)-based method that employs AmbientGANs with modern progressive or multiresolution training approaches is proposed. AmbientGANs established using the proposed training procedure are systematically validated in a controlled way using computer-simulated magnetic resonance imaging (MRI) data corresponding to a stylized imaging system. Emulated single-coil experimental MRI data are also employed to demonstrate the methods under less stylized conditions. Results: The proposed AmbientGAN method can generate clean images when the imaging measurements are contaminated by measurement noise. When the imaging measurement data are incomplete, the proposed AmbientGAN can reliably learn the distribution of the measurement components of the objects. Conclusions: Both visual examinations and quantitative analyses, including task-specific validations using the Hotelling observer, demonstrated that the proposed AmbientGAN method holds promise to establish realistic SOMs from imaging measurements.
Collapse
Affiliation(s)
- Weimin Zhou
- University of California Santa Barbara, Department of Psychological and Brain Sciences, Santa Barbara, California, United States
| | - Sayantan Bhadra
- Washington University in St. Louis, Department of Computer Science and Engineering, St. Louis, Missouri, United States
| | - Frank J. Brooks
- University of Illinois at Urbana-Champaign, Department of Bioengineering, Urbana, Illinois, United States
| | - Hua Li
- University of Illinois at Urbana-Champaign, Department of Bioengineering, Urbana, Illinois, United States
- Washington University in St. Louis, Department of Radiation Oncology, St. Louis, Missouri, United States
- Cancer Center at Illinois, Urbana, Illinois, United States
| | - Mark A. Anastasio
- University of Illinois at Urbana-Champaign, Department of Bioengineering, Urbana, Illinois, United States
| |
Collapse
|