Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug:253:108238.
doi: 10.1016/j.cmpb.2024.108238. Epub 2024 May 28.

On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness

Affiliations

On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness

Vangelis Lamprou et al. Comput Methods Programs Biomed. 2024 Aug.

Abstract

Background and objective: Evaluating the interpretability of Deep Learning models is crucial for building trust and gaining insights into their decision-making processes. In this work, we employ class activation map based attribution methods in a setting where only High-Resolution Class Activation Mapping (HiResCAM) is known to produce faithful explanations. The objective is to evaluate the quality of the attribution maps using quantitative metrics and investigate whether faithfulness aligns with the metrics results.

Methods: We fine-tune pre-trained deep learning architectures over four medical image datasets in order to calculate attribution maps. The maps are evaluated on a threefold metrics basis utilizing well-established evaluation scores.

Results: Our experimental findings suggest that the Area Over Perturbation Curve (AOPC) and Max-Sensitivity scores favor the HiResCAM maps. On the other hand, the Heatmap Assisted Accuracy Score (HAAS) does not provide insights to our comparison as it evaluates almost all maps as inaccurate. To this purpose we further compare our calculated values against values obtained over a diverse group of models which are trained on non-medical benchmark datasets, to eventually achieve more responsive results.

Conclusion: This study develops a series of experiments to discuss the connection between faithfulness and quantitative metrics over medical attribution maps. HiResCAM preserves the gradient effect on a pixel level ultimately producing high-resolution, informative and resilient mappings. In turn, this is depicted in the results of AOPC and Max-Sensitivity metrics, successfully identifying the faithful algorithm. In regards to HAAS, our experiments yield that it is sensitive over complex medical patterns, commonly characterized by strong color dependency and multiple attention areas.

Keywords: Attribution maps; Deep learning; Evaluation metrics; Faithfulness; Medical images.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no conflicts of interest.

LinkOut - more resources