1
|
Jin X, Hao Y, Hilliard J, Zhang Z, Thomas MA, Li H, Jha AK, Hugo GD. A quality assurance framework for routine monitoring of deep learning cardiac substructure computed tomography segmentation models in radiotherapy. Med Phys 2024; 51:2741-2758. [PMID: 38015793 DOI: 10.1002/mp.16846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/30/2023] Open
Abstract
BACKGROUND For autosegmentation models, the data used to train the model (e.g., public datasets and/or vendor-collected data) and the data on which the model is deployed in the clinic are typically not the same, potentially impacting the performance of these models by a process called domain shift. Tools to routinely monitor and predict segmentation performance are needed for quality assurance. Here, we develop an approach to perform such monitoring and performance prediction for cardiac substructure segmentation. PURPOSE To develop a quality assurance (QA) framework for routine or continuous monitoring of domain shift and the performance of cardiac substructure autosegmentation algorithms. METHODS A benchmark dataset consisting of computed tomography (CT) images along with manual cardiac substructure delineations of 241 breast cancer radiotherapy patients were collected, including one "normal" image domain of clean images and five "abnormal" domains containing images with artifact (metal, contrast), pathology, or quality variations due to scanner protocol differences (field of view, noise, reconstruction kernel, and slice thickness). The QA framework consisted of an image domain shift detector which operated on the input CT images and a shape quality detector on the output of an autosegmentation model, and a regression model for predicting autosegmentation model performance. The image domain shift detector was composed of a trained denoising autoencoder (DAE) and two hand-engineered image quality features to detect normal versus abnormal domains in the input CT images. The shape quality detector was a variational autoencoder (VAE) trained to estimate the shape quality of the auto-segmentation results. The output from the image domain shift and shape quality detectors was used to train a regression model to predict the per-patient segmentation accuracy, measured by Dice coefficient similarity (DSC) to physician contours. Different regression techniques were investigated including linear regression, Bagging, Gaussian process regression, random forest, and gradient boost regression. Of the 241 patients, 60 were used to train the autosegmentation models, 120 for training the QA framework, and the remaining 61 for testing the QA framework. A total of 19 autosegmentation models were used to evaluate QA framework performance, including 18 convolutional neural network (CNN)-based and one transformer-based model. RESULTS When tested on the benchmark dataset, all abnormal domains resulted in a significant DSC decrease relative to the normal domain for CNN models (p < 0.001 $p < 0.001$ ), but only for some domains for the transformer model. No significant relationship was found between the performance of an autosegmentation model and scanner protocol parameters (p = 0.42 $p = 0.42$ ) except noise (p = 0.01 $p = 0.01$ ). CNN-based autosegmentation models demonstrated a decreased DSC ranging from 0.07 to 0.41 with added noise, while the transformer-based model was not significantly affected (ANOVA,p = 0.99 $p=0.99$ ). For the QA framework, linear regression models with bootstrap aggregation resulted in the highest mean absolute error (MAE) of0.041 ± 0.002 $0.041 \pm 0.002$ , in predicted DSC (relative to true DSC between autosegmentation and physician). MAE was lowest when combining both input (image) detectors and output (shape) detectors compared to output detectors alone. CONCLUSIONS A QA framework was able to predict cardiac substructure autosegmentation model performance for clinically anticipated "abnormal" domain shifts.
Collapse
Affiliation(s)
- Xiyao Jin
- Department of Radiation Oncology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA
| | - Yao Hao
- Department of Radiation Oncology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA
| | - Jessica Hilliard
- Department of Radiation Oncology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA
| | - Zhehao Zhang
- Department of Radiation Oncology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA
| | - Maria A Thomas
- Department of Radiation Oncology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA
| | - Hua Li
- Department of Radiation Oncology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA
| | - Abhinav K Jha
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri, USA
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Geoffrey D Hugo
- Department of Radiation Oncology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
2
|
Liu Z, Mhlanga JC, Xia H, Siegel BA, Jha AK. Need for Objective Task-Based Evaluation of Image Segmentation Algorithms for Quantitative PET: A Study with ACRIN 6668/RTOG 0235 Multicenter Clinical Trial Data. J Nucl Med 2024; 65:jnumed.123.266018. [PMID: 38360049 PMCID: PMC10924158 DOI: 10.2967/jnumed.123.266018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 12/19/2023] [Accepted: 12/19/2023] [Indexed: 02/17/2024] Open
Abstract
Reliable performance of PET segmentation algorithms on clinically relevant tasks is required for their clinical translation. However, these algorithms are typically evaluated using figures of merit (FoMs) that are not explicitly designed to correlate with clinical task performance. Such FoMs include the Dice similarity coefficient (DSC), the Jaccard similarity coefficient (JSC), and the Hausdorff distance (HD). The objective of this study was to investigate whether evaluating PET segmentation algorithms using these task-agnostic FoMs yields interpretations consistent with evaluation on clinically relevant quantitative tasks. Methods: We conducted a retrospective study to assess the concordance in the evaluation of segmentation algorithms using the DSC, JSC, and HD and on the tasks of estimating the metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumors from PET images of patients with non-small cell lung cancer. The PET images were collected from the American College of Radiology Imaging Network 6668/Radiation Therapy Oncology Group 0235 multicenter clinical trial data. The study was conducted in 2 contexts: (1) evaluating conventional segmentation algorithms, namely those based on thresholding (SUVmax40% and SUVmax50%), boundary detection (Snakes), and stochastic modeling (Markov random field-Gaussian mixture model); (2) evaluating the impact of network depth and loss function on the performance of a state-of-the-art U-net-based segmentation algorithm. Results: Evaluation of conventional segmentation algorithms based on the DSC, JSC, and HD showed that SUVmax40% significantly outperformed SUVmax50%. However, SUVmax40% yielded lower accuracy on the tasks of estimating MTV and TLG, with a 51% and 54% increase, respectively, in the ensemble normalized bias. Similarly, the Markov random field-Gaussian mixture model significantly outperformed Snakes on the basis of the task-agnostic FoMs but yielded a 24% increased bias in estimated MTV. For the U-net-based algorithm, our evaluation showed that although the network depth did not significantly alter the DSC, JSC, and HD values, a deeper network yielded substantially higher accuracy in the estimated MTV and TLG, with a decreased bias of 91% and 87%, respectively. Additionally, whereas there was no significant difference in the DSC, JSC, and HD values for different loss functions, up to a 73% and 58% difference in the bias of the estimated MTV and TLG, respectively, existed. Conclusion: Evaluation of PET segmentation algorithms using task-agnostic FoMs could yield findings discordant with evaluation on clinically relevant quantitative tasks. This study emphasizes the need for objective task-based evaluation of image segmentation algorithms for quantitative PET.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri
| | - Joyce C Mhlanga
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri; and
| | - Huitian Xia
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri
| | - Barry A Siegel
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri; and
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri
| | - Abhinav K Jha
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri;
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri; and
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri
| |
Collapse
|
3
|
Li R, Zheng J, Zayed MA, Saffitz JE, Woodard PK, Jha AK. Carotid atherosclerotic plaque segmentation in multi-weighted MRI using a two-stage neural network: advantages of training with high-resolution imaging and histology. Front Cardiovasc Med 2023; 10:1127653. [PMID: 37293278 PMCID: PMC10244753 DOI: 10.3389/fcvm.2023.1127653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 04/27/2023] [Indexed: 06/10/2023] Open
Abstract
Introduction A reliable and automated method to segment and classify carotid artery atherosclerotic plaque components is needed to efficiently analyze multi-weighted magnetic resonance (MR) images to allow their integration into patient risk assessment for ischemic stroke. Certain plaque components such as lipid-rich necrotic core (LRNC) with hemorrhage suggest a greater likelihood of plaque rupture and stroke event. Assessment for presence and extent of LRNC could assist in directing treatment with impact upon patient outcomes. Methods To address the need to accurately determine the presence and extent of plaque components on carotid plaque MRI, we proposed a two-staged deep-learning-based approach that consists of a convolutional neural network (CNN), followed by a Bayesian neural network (BNN). The rationale for the two-stage network approach is to account for the class imbalance of vessel wall and background by providing an attention mask to the BNN. A unique feature of the network training was to use ground truth defined by both high-resolution ex vivo MRI data and histopathology. More specifically, standard resolution 1.5 T in vivo MR image sets with corresponding high resolution 3.0 T ex vivo MR image sets and histopathology image sets were used to define ground-truth segmentations. Of these, data from 7 patients was used for training and from the remaining two was used for testing the proposed method. Next, to evaluate the generalizability of the method, we tested the method with an additional standard resolution 3.0 T in vivo data set of 23 patients obtained from a different scanner. Results Our results show that the proposed method yielded accurate segmentation of carotid atherosclerotic plaque and outperforms not only manual segmentation by trained readers, who did not have access to the ex vivo or histopathology data, but also three state-of-the-art deep-learning-based segmentation methods. Further, the proposed approach outperformed a strategy where the ground truth was generated without access to the high resolution ex vivo MRI and histopathology. The accurate performance of this method was also observed in the additional 23-patient dataset from a different scanner. Conclusion In conclusion, the proposed method provides a mechanism to perform accurate segmentation of the carotid atherosclerotic plaque in multi-weighted MRI. Further, our study shows the advantages of using high-resolution imaging and histology to define ground truth for training deep-learning-based segmentation methods.
Collapse
Affiliation(s)
- Ran Li
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, United States
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, United States
| | - Jie Zheng
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, United States
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, United States
| | - Mohamed A. Zayed
- Department of Surgery, Washington University School of Medicine in St. Louis, St. Louis, MO, United States
| | - Jeffrey E. Saffitz
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, United States
| | - Pamela K. Woodard
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, United States
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, United States
| | - Abhinav K. Jha
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, United States
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, United States
| |
Collapse
|
4
|
Liu Z, Mhlanga JC, Siegel BA, Jha AK. Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2023; 12467:124670R. [PMID: 37990707 PMCID: PMC10659582 DOI: 10.1117/12.2647894] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Artificial intelligence (AI)-based methods are showing substantial promise in segmenting oncologic positron emission tomography (PET) images. For clinical translation of these methods, assessing their performance on clinically relevant tasks is important. However, these methods are typically evaluated using metrics that may not correlate with the task performance. One such widely used metric is the Dice score, a figure of merit that measures the spatial overlap between the estimated segmentation and a reference standard (e.g., manual segmentation). In this work, we investigated whether evaluating AI-based segmentation methods using Dice scores yields a similar interpretation as evaluation on the clinical tasks of quantifying metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumor from PET images of patients with non-small cell lung cancer. The investigation was conducted via a retrospective analysis with the ECOG-ACRIN 6668/RTOG 0235 multi-center clinical trial data. Specifically, we evaluated different structures of a commonly used AI-based segmentation method using both Dice scores and the accuracy in quantifying MTV/TLG. Our results show that evaluation using Dice scores can lead to findings that are inconsistent with evaluation using the task-based figure of merit. Thus, our study motivates the need for objective task-based evaluation of AI-based segmentation methods for quantitative PET.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Joyce C. Mhlanga
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Barry A. Siegel
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
| | - Abhinav K. Jha
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
5
|
Hatt M, Krizsan AK, Rahmim A, Bradshaw TJ, Costa PF, Forgacs A, Seifert R, Zwanenburg A, El Naqa I, Kinahan PE, Tixier F, Jha AK, Visvikis D. Joint EANM/SNMMI guideline on radiomics in nuclear medicine : Jointly supported by the EANM Physics Committee and the SNMMI Physics, Instrumentation and Data Sciences Council. Eur J Nucl Med Mol Imaging 2023; 50:352-375. [PMID: 36326868 PMCID: PMC9816255 DOI: 10.1007/s00259-022-06001-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 10/09/2022] [Indexed: 11/05/2022]
Abstract
PURPOSE The purpose of this guideline is to provide comprehensive information on best practices for robust radiomics analyses for both hand-crafted and deep learning-based approaches. METHODS In a cooperative effort between the EANM and SNMMI, we agreed upon current best practices and recommendations for relevant aspects of radiomics analyses, including study design, quality assurance, data collection, impact of acquisition and reconstruction, detection and segmentation, feature standardization and implementation, as well as appropriate modelling schemes, model evaluation, and interpretation. We also offer an outlook for future perspectives. CONCLUSION Radiomics is a very quickly evolving field of research. The present guideline focused on established findings as well as recommendations based on the state of the art. Though this guideline recognizes both hand-crafted and deep learning-based radiomics approaches, it primarily focuses on the former as this field is more mature. This guideline will be updated once more studies and results have contributed to improved consensus regarding the application of deep learning methods for radiomics. Although methodological recommendations in the present document are valid for most medical image modalities, we focus here on nuclear medicine, and specific recommendations when necessary are made for PET/CT, PET/MR, and quantitative SPECT.
Collapse
Affiliation(s)
- M Hatt
- LaTIM, INSERM, UMR 1101, Univ Brest, Brest, France
| | | | - A Rahmim
- Departments of Radiology and Physics, University of British Columbia, Vancouver, BC, Canada
| | - T J Bradshaw
- Department of Radiology, University of Wisconsin, Madison, WI, USA
| | - P F Costa
- Department of Nuclear Medicine, West German Cancer Center, University of Duisburg-Essen and German Cancer Consortium (DKTK)-University Hospital Essen, Essen, Germany
| | | | - R Seifert
- Department of Nuclear Medicine, West German Cancer Center, University of Duisburg-Essen and German Cancer Consortium (DKTK)-University Hospital Essen, Essen, Germany.
- Department of Nuclear Medicine, Münster University Hospital, Münster, Germany.
| | - A Zwanenburg
- OncoRay-National Center for Radiation Research in Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Germany
- National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - I El Naqa
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, 33626, USA
| | - P E Kinahan
- Imaging Research Laboratory, PET/CT Physics, Department of Radiology, UW Medical Center, University of Washington, Seattle, WA, USA
| | - F Tixier
- LaTIM, INSERM, UMR 1101, Univ Brest, Brest, France
| | - A K Jha
- McKelvey School of Engineering and Mallinckrodt Institute of Radiology, Washington University in St. Louis, Saint Louis, MO, USA
| | - D Visvikis
- LaTIM, INSERM, UMR 1101, Univ Brest, Brest, France
| |
Collapse
|
6
|
Jha AK, Bradshaw TJ, Buvat I, Hatt M, Kc P, Liu C, Obuchowski NF, Saboury B, Slomka PJ, Sunderland JJ, Wahl RL, Yu Z, Zuehlsdorff S, Rahmim A, Boellaard R. Nuclear Medicine and Artificial Intelligence: Best Practices for Evaluation (the RELAINCE Guidelines). J Nucl Med 2022; 63:1288-1299. [PMID: 35618476 PMCID: PMC9454473 DOI: 10.2967/jnumed.121.263239] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 05/11/2022] [Indexed: 01/26/2023] Open
Abstract
An important need exists for strategies to perform rigorous objective clinical-task-based evaluation of artificial intelligence (AI) algorithms for nuclear medicine. To address this need, we propose a 4-class framework to evaluate AI algorithms for promise, technical task-specific efficacy, clinical decision making, and postdeployment efficacy. We provide best practices to evaluate AI algorithms for each of these classes. Each class of evaluation yields a claim that provides a descriptive performance of the AI algorithm. Key best practices are tabulated as the RELAINCE (Recommendations for EvaLuation of AI for NuClear medicinE) guidelines. The report was prepared by the Society of Nuclear Medicine and Molecular Imaging AI Task Force Evaluation team, which consisted of nuclear-medicine physicians, physicists, computational imaging scientists, and representatives from industry and regulatory agencies.
Collapse
Affiliation(s)
- Abhinav K Jha
- Department of Biomedical Engineering and Mallinckrodt Institute of Radiology, Washington University in St. Louis, Missouri;
| | - Tyler J Bradshaw
- Department of Radiology, University of Wisconsin-Madison, Madison, Wisconsin
| | - Irène Buvat
- LITO, Institut Curie, Université PSL, U1288 Inserm, Orsay, France
| | - Mathieu Hatt
- LaTiM, INSERM, UMR 1101, Univ Brest, Brest, France
| | - Prabhat Kc
- Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland
| | - Chi Liu
- Department of Radiology and Biomedical Imaging, Yale University, Connecticut
| | | | - Babak Saboury
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Maryland
| | - Piotr J Slomka
- Department of Imaging, Medicine, and Cardiology, Cedars-Sinai Medical Center, California
| | | | - Richard L Wahl
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, Missouri
| | - Zitong Yu
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri
| | | | - Arman Rahmim
- Departments of Radiology and Physics, University of British Columbia, Canada; and
| | - Ronald Boellaard
- Department of Radiology & Nuclear Medicine, Cancer Centre Amsterdam, Amsterdam University Medical Centers, Netherlands
| |
Collapse
|
7
|
Liu Z, Moon HS, Li Z, Laforest R, Perlmutter JS, Norris SA, Jha AK. A tissue‐fraction estimation‐based segmentation method for quantitative dopamine transporter SPECT. Med Phys 2022; 49:5121-5137. [PMID: 35635327 PMCID: PMC9703616 DOI: 10.1002/mp.15778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 04/25/2022] [Accepted: 05/16/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Quantitative measures of dopamine transporter (DaT) uptake in caudate, putamen, and globus pallidus (GP) derived from dopamine transporter-single-photon emission computed tomography (DaT-SPECT) images have potential as biomarkers for measuring the severity of Parkinson's disease. Reliable quantification of this uptake requires accurate segmentation of the considered regions. However, segmentation of these regions from DaT-SPECT images is challenging, a major reason being partial-volume effects (PVEs) in SPECT. The PVEs arise from two sources, namely the limited system resolution and reconstruction of images over finite-sized voxel grids. The limited system resolution results in blurred boundaries of the different regions. The finite voxel size leads to TFEs, that is, voxels contain a mixture of regions. Thus, there is an important need for methods that can account for the PVEs, including the TFEs, and accurately segment the caudate, putamen, and GP, from DaT-SPECT images. PURPOSE Design and objectively evaluate a fully automated tissue-fraction estimation-based segmentation method that segments the caudate, putamen, and GP from DaT-SPECT images. METHODS The proposed method estimates the posterior mean of the fractional volumes occupied by the caudate, putamen, and GP within each voxel of a three-dimensional DaT-SPECT image. The estimate is obtained by minimizing a cost function based on the binary cross-entropy loss between the true and estimated fractional volumes over a population of SPECT images, where the distribution of true fractional volumes is obtained from existing populations of clinical magnetic resonance images. The method is implemented using a supervised deep-learning-based approach. RESULTS Evaluations using clinically guided highly realistic simulation studies show that the proposed method accurately segmented the caudate, putamen, and GP with high mean Dice similarity coefficients of ∼ 0.80 and significantly outperformed ( p < 0.01 $p < 0.01$ ) all other considered segmentation methods. Further, an objective evaluation of the proposed method on the task of quantifying regional uptake shows that the method yielded reliable quantification with low ensemble normalized root mean square error (NRMSE) < 20% for all the considered regions. In particular, the method yielded an even lower ensemble NRMSE of ∼ 10% for the caudate and putamen. CONCLUSIONS The proposed tissue-fraction estimation-based segmentation method for DaT-SPECT images demonstrated the ability to accurately segment the caudate, putamen, and GP, and reliably quantify the uptake within these regions. The results motivate further evaluation of the method with physical-phantom and patient studies.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering Washington University St. Louis MO 63130 United States of America
| | - Hae Sol Moon
- Department of Biomedical Engineering Washington University St. Louis MO 63130 United States of America
| | - Zekun Li
- Department of Biomedical Engineering Washington University St. Louis MO 63130 United States of America
| | - Richard Laforest
- Mallinckrodt Institute of Radiology Washington University School of Medicine St. Louis MO 63110 United States of America
| | - Joel S. Perlmutter
- Mallinckrodt Institute of Radiology Washington University School of Medicine St. Louis MO 63110 United States of America
- Department of Neurology Washington University School of Medicine St. Louis MO 63110 United States of America
| | - Scott A. Norris
- Mallinckrodt Institute of Radiology Washington University School of Medicine St. Louis MO 63110 United States of America
- Department of Neurology Washington University School of Medicine St. Louis MO 63110 United States of America
| | - Abhinav K. Jha
- Department of Biomedical Engineering Washington University St. Louis MO 63130 United States of America
- Mallinckrodt Institute of Radiology Washington University School of Medicine St. Louis MO 63110 United States of America
| |
Collapse
|
8
|
Liu Z, Li Z, Mhlanga JC, Siegel BA, Jha AK. No-gold-standard evaluation of quantitative imaging methods in the presence of correlated noise. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2022; 12035:120350M. [PMID: 36465994 PMCID: PMC9717481 DOI: 10.1117/12.2605762] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Objective evaluation of quantitative imaging (QI) methods with patient data is highly desirable, but is hindered by the lack or unreliability of an available gold standard. To address this issue, techniques that can evaluate QI methods without access to a gold standard are being actively developed. These techniques assume that the true and measured values are linearly related by a slope, bias, and Gaussian-distributed noise term, where the noise between measurements made by different methods is independent of each other. However, this noise arises in the process of measuring the same quantitative value, and thus can be correlated. To address this limitation, we propose a no-gold-standard evaluation (NGSE) technique that models this correlated noise by a multi-variate Gaussian distribution parameterized by a covariance matrix. We derive a maximum-likelihood-based approach to estimate the parameters that describe the relationship between the true and measured values, without any knowledge of the true values. We then use the estimated slopes and diagonal elements of the covariance matrix to compute the noise-to-slope ratio (NSR) to rank the QI methods on the basis of precision. The proposed NGSE technique was evaluated with multiple numerical experiments. Our results showed that the technique reliably estimated the NSR values and yielded accurate rankings of the considered methods for 83% of 160 trials. In particular, the technique correctly identified the most precise method for ∼ 97% of the trials. Overall, this study demonstrates the efficacy of the NGSE technique to accurately rank different QI methods when correlated noise is present, and without access to any knowledge of the ground truth. The results motivate further validation of this technique with realistic simulation studies and patient data.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
| | - Zekun Li
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
| | - Joyce C. Mhlanga
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Barry A. Siegel
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Abhinav K. Jha
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
9
|
Jha AK, Myers KJ, Obuchowski NA, Liu Z, Rahman MA, Saboury B, Rahmim A, Siegel BA. Objective Task-Based Evaluation of Artificial Intelligence-Based Medical Imaging Methods:: Framework, Strategies, and Role of the Physician. PET Clin 2021; 16:493-511. [PMID: 34537127 DOI: 10.1016/j.cpet.2021.06.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Artificial intelligence-based methods are showing promise in medical imaging applications. There is substantial interest in clinical translation of these methods, requiring that they be evaluated rigorously. We lay out a framework for objective task-based evaluation of artificial intelligence methods. We provide a list of available tools to conduct this evaluation. We outline the important role of physicians in conducting these evaluation studies. The examples in this article are proposed in the context of PET scans with a focus on evaluating neural network-based methods. However, the framework is also applicable to evaluate other medical imaging modalities and other types of artificial intelligence methods.
Collapse
Affiliation(s)
- Abhinav K Jha
- Department of Biomedical Engineering, Mallinckrodt Institute of Radioly, Alvin J. Siteman Cancer Center, Washington University in St. Louis, 510 S Kingshighway Boulevard, St Louis, MO 63110, USA.
| | - Kyle J Myers
- Division of Imaging, Diagnostics, and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration (FDA), Silver Spring, MD, USA
| | | | - Ziping Liu
- Department of Biomedical Engineering, Washington University in St. Louis, 1 Brookings Drive, St Louis, MO 63130, USA
| | - Md Ashequr Rahman
- Department of Biomedical Engineering, Washington University in St. Louis, 1 Brookings Drive, St Louis, MO 63130, USA
| | - Babak Saboury
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Arman Rahmim
- Department of Radiology, Department of Physics, University of British Columbia, BC Cancer, BC Cancer Research Institute, 675 West 10th Avenue, Office 6-112, Vancouver, British Columbia V5Z 1L3, Canada
| | - Barry A Siegel
- Division of Nuclear Medicine, Mallinckrodt Institute of Radiology, Alvin J. Siteman Cancer Center, Washington University School of Medicine, 510 S Kingshighway Boulevard #956, St Louis, MO 63110, USA
| |
Collapse
|
10
|
Yousefirizi F, Jha AK, Brosch-Lenz J, Saboury B, Rahmim A. Toward High-Throughput Artificial Intelligence-Based Segmentation in Oncological PET Imaging. PET Clin 2021; 16:577-596. [PMID: 34537131 DOI: 10.1016/j.cpet.2021.06.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Artificial intelligence (AI) techniques for image-based segmentation have garnered much attention in recent years. Convolutional neural networks have shown impressive results and potential toward fully automated segmentation in medical imaging, and particularly PET imaging. To cope with the limited access to annotated data needed in supervised AI methods, given tedious and prone-to-error manual delineations, semi-supervised and unsupervised AI techniques have also been explored for segmentation of tumors or normal organs in single- and bimodality scans. This work reviews existing AI techniques for segmentation tasks and the evaluation criteria for translational AI-based segmentation efforts toward routine adoption in clinical workflows.
Collapse
Affiliation(s)
- Fereshteh Yousefirizi
- Department of Integrative Oncology, BC Cancer Research Institute, 675 West 10th Avenue, Vancouver, British Columbia V5Z 1L3, Canada.
| | - Abhinav K Jha
- Department of Biomedical Engineering, Washington University in St. Louis, St Louis, MO 63130, USA; Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Julia Brosch-Lenz
- Department of Integrative Oncology, BC Cancer Research Institute, 675 West 10th Avenue, Vancouver, British Columbia V5Z 1L3, Canada
| | - Babak Saboury
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20892, USA; Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD, USA; Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA 19104, USA
| | - Arman Rahmim
- Department of Radiology, University of British Columbia, BC Cancer, BC Cancer Research Institute, 675 West 10th Avenue, Office 6-112, Vancouver, British Columbia V5Z 1L3, Canada; Department of Physics, University of British Columbia, Senior Scientist & Provincial Medical Imaging Physicist, BC Cancer, BC Cancer Research Institute, 675 West 10th Avenue, Office 6-112, Vancouver, British Columbia V5Z 1L3, Canada
| |
Collapse
|
11
|
Liu Z, Mhlanga JC, Laforest R, Derenoncourt PR, Siegel BA, Jha AK. A Bayesian approach to tissue-fraction estimation for oncological PET segmentation. Phys Med Biol 2021; 66. [PMID: 34125078 PMCID: PMC8765116 DOI: 10.1088/1361-6560/ac01f4] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Accepted: 05/17/2021] [Indexed: 01/06/2023]
Abstract
Tumor segmentation in oncological PET is challenging, a major reason being the partial-volume effects (PVEs) that arise due to low system resolution and finite voxel size. The latter results in tissue-fraction effects (TFEs), i.e. voxels contain a mixture of tissue classes. Conventional segmentation methods are typically designed to assign each image voxel as belonging to a certain tissue class. Thus, these methods are inherently limited in modeling TFEs. To address the challenge of accounting for PVEs, and in particular, TFEs, we propose a Bayesian approach to tissue-fraction estimation for oncological PET segmentation. Specifically, this Bayesian approach estimates the posterior mean of the fractional volume that the tumor occupies within each image voxel. The proposed method, implemented using a deep-learning-based technique, was first evaluated using clinically realistic 2D simulation studies with known ground truth, in the context of segmenting the primary tumor in PET images of patients with lung cancer. The evaluation studies demonstrated that the method accurately estimated the tumor-fraction areas and significantly outperformed widely used conventional PET segmentation methods, including a U-net-based method, on the task of segmenting the tumor. In addition, the proposed method was relatively insensitive to PVEs and yielded reliable tumor segmentation for different clinical-scanner configurations. The method was then evaluated using clinical images of patients with stage IIB/III non-small cell lung cancer from ACRIN 6668/RTOG 0235 multi-center clinical trial. Here, the results showed that the proposed method significantly outperformed all other considered methods and yielded accurate tumor segmentation on patient images with Dice similarity coefficient (DSC) of 0.82 (95% CI: 0.78, 0.86). In particular, the method accurately segmented relatively small tumors, yielding a high DSC of 0.77 for the smallest segmented cross-section of 1.30 cm2. Overall, this study demonstrates the efficacy of the proposed method to accurately segment tumors in PET images.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63130, United States of America
| | - Joyce C Mhlanga
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO 63110, United States of America
| | - Richard Laforest
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO 63110, United States of America
| | - Paul-Robert Derenoncourt
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO 63110, United States of America
| | - Barry A Siegel
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO 63110, United States of America
| | - Abhinav K Jha
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63130, United States of America.,Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO 63110, United States of America
| |
Collapse
|
12
|
Leung KH, Marashdeh W, Wray R, Ashrafinia S, Pomper MG, Rahmim A, Jha AK. A physics-guided modular deep-learning based automated framework for tumor segmentation in PET. Phys Med Biol 2020; 65:245032. [PMID: 32235059 DOI: 10.1088/1361-6560/ab8535] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
An important need exists for reliable positron emission tomography (PET) tumor-segmentation methods for tasks such as PET-based radiation-therapy planning and reliable quantification of volumetric and radiomic features. To address this need, we propose an automated physics-guided deep-learning-based three-module framework to segment PET images on a per-slice basis. The framework is designed to help address the challenges of limited spatial resolution and lack of clinical training data with known ground-truth tumor boundaries in PET. The first module generates PET images containing highly realistic tumors with known ground-truth using a new stochastic and physics-based approach, addressing lack of training data. The second module trains a modified U-net using these images, helping it learn the tumor-segmentation task. The third module fine-tunes this network using a small-sized clinical dataset with radiologist-defined delineations as surrogate ground-truth, helping the framework learn features potentially missed in simulated tumors. The framework was evaluated in the context of segmenting primary tumors in 18F-fluorodeoxyglucose (FDG)-PET images of patients with lung cancer. The framework's accuracy, generalizability to different scanners, sensitivity to partial volume effects (PVEs) and efficacy in reducing the number of training images were quantitatively evaluated using Dice similarity coefficient (DSC) and several other metrics. The framework yielded reliable performance in both simulated (DSC: 0.87 (95% confidence interval (CI): 0.86, 0.88)) and patient images (DSC: 0.73 (95% CI: 0.71, 0.76)), outperformed several widely used semi-automated approaches, accurately segmented relatively small tumors (smallest segmented cross-section was 1.83 cm2), generalized across five PET scanners (DSC: 0.74 (95% CI: 0.71, 0.76)), was relatively unaffected by PVEs, and required low training data (training with data from even 30 patients yielded DSC of 0.70 (95% CI: 0.68, 0.71)). In conclusion, the proposed automated physics-guided deep-learning-based PET-segmentation framework yielded reliable performance in delineating tumors in FDG-PET images of patients with lung cancer.
Collapse
Affiliation(s)
- Kevin H Leung
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
- The Russell H. Morgan Department of Radiology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Wael Marashdeh
- Department of Radiology and Nuclear Medicine, Jordan University of Science and Technology, Ar Ramtha, Jordan
| | - Rick Wray
- Memorial Sloan Kettering Cancer Center, Greater New York City Area, NY, United States of America
| | - Saeed Ashrafinia
- The Russell H. Morgan Department of Radiology, Johns Hopkins University, Baltimore, MD, United States of America
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Martin G Pomper
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
- The Russell H. Morgan Department of Radiology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Arman Rahmim
- The Russell H. Morgan Department of Radiology, Johns Hopkins University, Baltimore, MD, United States of America
- Departments of Radiology and Physics, University of British Columbia, Vancouver, BC, Canada
| | - Abhinav K Jha
- Department of Biomedical Engineering and Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, United States of America
| |
Collapse
|
13
|
Naji N, Sun H, Wilman AH. On the value of QSM from MPRAGE for segmenting and quantifying iron‐rich deep gray matter. Magn Reson Med 2020; 84:1486-1500. [DOI: 10.1002/mrm.28226] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/20/2020] [Accepted: 02/03/2020] [Indexed: 01/10/2023]
Affiliation(s)
- Nashwan Naji
- Department of Biomedical Engineering University of Alberta Edmonton Alberta Canada
| | - Hongfu Sun
- School of Information Technology and Electrical Engineering University of Queensland Brisbane Queensland Australia
| | - Alan H. Wilman
- Department of Biomedical Engineering University of Alberta Edmonton Alberta Canada
| |
Collapse
|
14
|
Osadebey M, Pedersen M, Arnold D, Wendel-Mitoraj K. Bayesian framework inspired no-reference region-of-interest quality measure for brain MRI images. J Med Imaging (Bellingham) 2017. [PMID: 28630885 DOI: 10.1117/1.jmi.4.2.025504] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We describe a postacquisition, attribute-based quality assessment method for brain magnetic resonance imaging (MRI) images. It is based on the application of Bayes theory to the relationship between entropy and image quality attributes. The entropy feature image of a slice is segmented into low- and high-entropy regions. For each entropy region, there are three separate observations of contrast, standard deviation, and sharpness quality attributes. A quality index for a quality attribute is the posterior probability of an entropy region given any corresponding region in a feature image where quality attribute is observed. Prior belief in each entropy region is determined from normalized total clique potential (TCP) energy of the slice. For TCP below the predefined threshold, the prior probability for a region is determined by deviation of its percentage composition in the slice from a standard normal distribution built from 250 MRI volume data provided by Alzheimer's Disease Neuroimaging Initiative. For TCP above the threshold, the prior is computed using a mathematical model that describes the TCP-noise level relationship in brain MRI images. Our proposed method assesses the image quality of each entropy region and the global image. Experimental results demonstrate good correlation with subjective opinions of radiologists for different types and levels of quality distortions.
Collapse
Affiliation(s)
- Michael Osadebey
- NeuroRx Research Inc., MRI Reader Group, Montreal, Québec, Canada
| | - Marius Pedersen
- Norwegian University of Science and Technology, Department of Computer Science, Gjøvik, Norway
| | | | | | | |
Collapse
|
15
|
Jha AK, Frey E. No-gold-standard evaluation of image-acquisition methods using patient data. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2017; 10136. [PMID: 28596636 DOI: 10.1117/12.2255902] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Several new and improved modalities, scanners, and protocols, together referred to as image-acquisition methods (IAMs), are being developed to provide reliable quantitative imaging. Objective evaluation of these IAMs on the clinically relevant quantitative tasks is highly desirable. Such evaluation is most reliable and clinically decisive when performed with patient data, but that requires the availability of a gold standard, which is often rare. While no-gold-standard (NGS) techniques have been developed to clinically evaluate quantitative imaging methods, these techniques require that each of the patients be scanned using all the IAMs, which is expensive, time consuming, and could lead to increased radiation dose. A more clinically practical scenario is where different set of patients are scanned using different IAMs. We have developed an NGS technique that uses patient data where different patient sets are imaged using different IAMs to compare the different IAMs. The technique posits a linear relationship, characterized by a slope, bias, and noise standard-deviation term, between the true and measured quantitative values. Under the assumption that the true quantitative values have been sampled from a unimodal distribution, a maximum-likelihood procedure was developed that estimates these linear relationship parameters for the different IAMs. Figures of merit can be estimated using these linear relationship parameters to evaluate the IAMs on the basis of accuracy, precision, and overall reliability. The proposed technique has several potential applications such as in protocol optimization, quantifying difference in system performance, and system harmonization using patient data.
Collapse
Affiliation(s)
- Abhinav K Jha
- Department of Radiology, Johns Hopkins University, Baltimore, MD USA
| | - Eric Frey
- Department of Radiology, Johns Hopkins University, Baltimore, MD USA
| |
Collapse
|