Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 31:11:194.
doi: 10.3389/fnagi.2019.00194. eCollection 2019.

Layer-Wise Relevance Propagation for Explaining Deep Neural Network Decisions in MRI-Based Alzheimer's Disease Classification

Affiliations

Layer-Wise Relevance Propagation for Explaining Deep Neural Network Decisions in MRI-Based Alzheimer's Disease Classification

Moritz Böhle et al. Front Aging Neurosci. .

Abstract

Deep neural networks have led to state-of-the-art results in many medical imaging tasks including Alzheimer's disease (AD) detection based on structural magnetic resonance imaging (MRI) data. However, the network decisions are often perceived as being highly non-transparent, making it difficult to apply these algorithms in clinical routine. In this study, we propose using layer-wise relevance propagation (LRP) to visualize convolutional neural network decisions for AD based on MRI data. Similarly to other visualization methods, LRP produces a heatmap in the input space indicating the importance/relevance of each voxel contributing to the final classification outcome. In contrast to susceptibility maps produced by guided backpropagation ("Which change in voxels would change the outcome most?"), the LRP method is able to directly highlight positive contributions to the network classification in the input space. In particular, we show that (1) the LRP method is very specific for individuals ("Why does this person have AD?") with high inter-patient variability, (2) there is very little relevance for AD in healthy controls and (3) areas that exhibit a lot of relevance correlate well with what is known from literature. To quantify the latter, we compute size-corrected metrics of the summed relevance per brain area, e.g., relevance density or relevance gain. Although these metrics produce very individual "fingerprints" of relevance patterns for AD patients, a lot of importance is put on areas in the temporal lobe including the hippocampus. After discussing several limitations such as sensitivity toward the underlying model and computation parameters, we conclude that LRP might have a high potential to assist clinicians in explaining neural network decisions for diagnosing AD (and potentially other diseases) based on structural MRI data.

Keywords: Alzheimer's disease; MRI; convolutional neural networks (CNN); deep learning; explainability; layer-wise relevance propagation; visualization.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the benefit of visualization in a deep learning framework for diagnosing Alzheimer's disease (AD) based on structural MRI data. Deep neural networks are often criticized for being non-transparent, since they usually provide only one single class score as output and do not explain what has led to this particular network decision; in this example, the MRI input is classified as belonging to the group of AD patients with a probability of 89%. When no further information is given, the medical expert is not able to base any medical treatment on this number, since the underlying reasons are unclear. The layer-wise relevance propagation method (LRP) might alleviate this problem by additionally providing a heatmap in which the positive contributions to the class score (89% AD) are highlighted. Here, the class score is supplemented by the additional information that in this particular subject AD relevance has been found in the hippocampus, an area known to be affected in AD. By providing a visual explanation, the LRP framework allows the medical expert to make a much more informed decision.
Figure 2
Figure 2
Average heatmaps for AD patients and healthy controls (HCs) in the test set are shown separately for LRP with β = 0, 0.5, 1 (Left) and GB (Right). The scale for the heatmap is chosen relative to the average AD patient heatmap for LRP and GB respectively. Hence, values in the average heatmaps that are higher than the 50th percentile and lower than the 99.5th percentile are linearly color-coded as shown on the scale. Values below (above) these numbers are black (white).
Figure 3
Figure 3
The average heatmaps over all subjects in the test set are plotted for the following cases (Left to Right): true positives, false positives, true negatives, and false negatives; separately for LRP with β = 0 (Left) and GB (Right). For each heatmap, the color-coding is the same as in Figure 2, i.e., with all values smaller than the 50th percentile of the average AD patient in black, increasing values going over red to yellow, and all values greater than the 99.5th percentile in white.
Figure 4
Figure 4
Absolute sum of relevance (LRP, Top) and absolute sum of susceptibility (GB, Bottom) is shown for different brain areas. Susceptibility refers to the absolute value of the GB gradients. Only the top 25 most important areas under this metric are shown for LRP and GB respectively. The circles show the average sum for each area over all AD patients (orange) and all healthy controls (HCs, green) in the test set. By setting the metric to linearly scale with the corresponding brain area size, it becomes clear that this metric is correlated with the size of the brain areas.
Figure 5
Figure 5
Size-normalized relevance (LRP, Top) and size-normalized susceptibility (GB, Bottom) is shown for different brain areas. Only the top 25 most important areas under this metric are shown for LRP and GB respectively. We show the average density for all AD patients (orange circles) and all healthy controls (HCs, green circles) in the test set along with a density estimation of the distribution of values per area (orange and green shaded area for AD and HCs respectively). Moreover, two patients were selected to emphasize the diversity in relevance distributions for LRP; the patients were selected as those with the highest cosine distance in the relevance-density space of the 25 areas between each other among those patients that were classified as AD with a class score >90%.
Figure 6
Figure 6
Gain ofrelevance (LRP, Top) and gain of susceptibility (GB, Bottom) is shown for different brain areas. The gain per area is defined as the average sum of relevance (LRP) or susceptibility (GB) in a given area divided by the average sum in this area over all healthy controls (HCs) in the test set. Again, only the top 25 most important areas under this metric are shown for LRP and GB respectively. To provide an estimate of gain in correctly classified subjects, we show here the mean and density estimations only for true positive (TP) and true negative (TN) classifications. As an additional visual aid, the identity gain (gain of 1) is shown as a dashed line.
Figure 7
Figure 7
Comparison of the effect of different β values on the regional ordering in Figures 4–6. The intersection between the top 10 regions of the three metrics is shown for different LRP β values in %.
Figure 8
Figure 8
Three brain slices are shown for patient A and patient B, whose individual slopes in relevance density have been shown in Figure 5. The highlighted areas are the hippocampus, temporal pole, amygdala, parahippocampal gyrus, medial temporal gyrus (MTG), superior temporal gyrus (STG), triangular part of the inferior frontal gyrus (TrIFG) and frontal pole. The scale for the heatmap is chosen relative to the average AD patient heatmap. Hence, values in the individual patients that are higher than the 90th percentile and lower than the 99.5th percentile are linearly color-coded as shown on the scale. Values below (above) these numbers are transparent (yellow).
Figure 9
Figure 9
Correlation between hippocampal volume and LRP relevance/GB susceptibility in hippocampus for correctly classified AD patients (true positives; Left: LRP, Right: GB). For illustration, we show additionally the false positive classifications.

References

    1. Abdel-Hamid O., Mohamed A.-R., Jiang H., Deng L., Penn G., Yu D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech. Language Proc. 22, 1533–1545. 10.1109/TASLP.2014.2339736 - DOI
    1. Adebayo J., Gilmer J., Muelly M., Goodfellow I., Hardt M., Kim B. (2018). Sanity checks for saliency maps, in Advances in Neural Information Processing Systems, eds Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., Garnett R. (Montréal, QC: Curran Associates, Inc.), 9505–9515.
    1. Bach S., Binder A., Montavon G., Klauschen F., Müller K.-R., Samek W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10:e0130140. 10.1371/journal.pone.0130140 - DOI - PMC - PubMed
    1. Bakker R., Tiesinga P., Kötter R. (2015). The scalable brain atlas: instant web-based access to public brain atlases and related content. Neuroinformatics 13, 353–366. 10.1007/s12021-014-9258-x - DOI - PMC - PubMed
    1. Binder A., Bach S., Montavon G., Müller K.-R., Samek W. (2016a). Layer-wise relevance propagation for deep neural network architectures, in Information Science and Applications (ICISA) 2016, eds Kim K. J., Joukov N. (Singapore: Springer Singapore; ), 913–922.