On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness

Vangelis Lamprou¹, Athanasios Kallipolitis², Ilias Maglogiannis¹

Affiliations

¹ Department of Digital Systems, University of Piraeus, 80, M. Karaoli & A. Dimitriou St, Piraeus 18534, Greece.
² Department of Digital Systems, University of Piraeus, 80, M. Karaoli & A. Dimitriou St, Piraeus 18534, Greece. Electronic address: nasskall@unipi.gr.

PMID: 38823117
DOI: 10.1016/j.cmpb.2024.108238

On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness

Vangelis Lamprou et al. Comput Methods Programs Biomed. 2024 Aug.

. 2024 Aug:253:108238.

doi: 10.1016/j.cmpb.2024.108238. Epub 2024 May 28.

Authors

Vangelis Lamprou¹, Athanasios Kallipolitis², Ilias Maglogiannis¹

Affiliations

¹ Department of Digital Systems, University of Piraeus, 80, M. Karaoli & A. Dimitriou St, Piraeus 18534, Greece.
² Department of Digital Systems, University of Piraeus, 80, M. Karaoli & A. Dimitriou St, Piraeus 18534, Greece. Electronic address: nasskall@unipi.gr.

PMID: 38823117
DOI: 10.1016/j.cmpb.2024.108238

Abstract

Background and objective: Evaluating the interpretability of Deep Learning models is crucial for building trust and gaining insights into their decision-making processes. In this work, we employ class activation map based attribution methods in a setting where only High-Resolution Class Activation Mapping (HiResCAM) is known to produce faithful explanations. The objective is to evaluate the quality of the attribution maps using quantitative metrics and investigate whether faithfulness aligns with the metrics results.

Methods: We fine-tune pre-trained deep learning architectures over four medical image datasets in order to calculate attribution maps. The maps are evaluated on a threefold metrics basis utilizing well-established evaluation scores.

Results: Our experimental findings suggest that the Area Over Perturbation Curve (AOPC) and Max-Sensitivity scores favor the HiResCAM maps. On the other hand, the Heatmap Assisted Accuracy Score (HAAS) does not provide insights to our comparison as it evaluates almost all maps as inaccurate. To this purpose we further compare our calculated values against values obtained over a diverse group of models which are trained on non-medical benchmark datasets, to eventually achieve more responsive results.

Conclusion: This study develops a series of experiments to discuss the connection between faithfulness and quantitative metrics over medical attribution maps. HiResCAM preserves the gradient effect on a pixel level ultimately producing high-resolution, informative and resilient mappings. In turn, this is depicted in the results of AOPC and Max-Sensitivity metrics, successfully identifying the faithful algorithm. In regards to HAAS, our experiments yield that it is sensitive over complex medical patterns, commonly characterized by strong color dependency and multiple attention areas.

Keywords: Attribution maps; Deep learning; Evaluation metrics; Faithfulness; Medical images.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no conflicts of interest.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness

Affiliations

On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness

Authors

Affiliations

Abstract

Conflict of interest statement

MeSH terms

LinkOut - more resources

Full Text Sources