Explainable AI to improve acceptance of convolutional neural networks for automatic classification of dopamine transporter SPECT in the diagnosis of clinically uncertain parkinsonian syndromes.
Eur J Nucl Med Mol Imaging 2021;
49:1176-1186. [PMID:
34651223 PMCID:
PMC8921148 DOI:
10.1007/s00259-021-05569-9]
[Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 09/17/2021] [Indexed: 12/31/2022]
Abstract
Purpose
Deep convolutional neural networks (CNN) provide high accuracy for automatic classification of dopamine transporter (DAT) SPECT images. However, CNN are inherently black-box in nature lacking any kind of explanation for their decisions. This limits their acceptance for clinical use. This study tested layer-wise relevance propagation (LRP) to explain CNN-based classification of DAT-SPECT in patients with clinically uncertain parkinsonian syndromes.
Methods
The study retrospectively included 1296 clinical DAT-SPECT with visual binary interpretation as “normal” or “reduced” by two experienced readers as standard-of-truth. A custom-made CNN was trained with 1008 randomly selected DAT-SPECT. The remaining 288 DAT-SPECT were used to assess classification performance of the CNN and to test LRP for explanation of the CNN-based classification.
Results
Overall accuracy, sensitivity, and specificity of the CNN were 95.8%, 92.8%, and 98.7%, respectively. LRP provided relevance maps that were easy to interpret in each individual DAT-SPECT. In particular, the putamen in the hemisphere most affected by nigrostriatal degeneration was the most relevant brain region for CNN-based classification in all reduced DAT-SPECT. Some misclassified DAT-SPECT showed an “inconsistent” relevance map more typical for the true class label.
Conclusion
LRP is useful to provide explanation of CNN-based decisions in individual DAT-SPECT and, therefore, can be recommended to support CNN-based classification of DAT-SPECT in clinical routine. Total computation time of 3 s is compatible with busy clinical workflow. The utility of “inconsistent” relevance maps to identify misclassified cases requires further investigation.
Supplementary Information
The online version contains supplementary material available at 10.1007/s00259-021-05569-9.
Collapse