A review of evaluation approaches for explainable AI with applications in cardiology

doi:10.1007/s10462-024-10852-w

. 2024;57(9):240.

doi: 10.1007/s10462-024-10852-w. Epub 2024 Aug 9.

A review of evaluation approaches for explainable AI with applications in cardiology

Ahmed M Salih^{1

2

3}, Ilaria Boscolo Galazzo⁴, Polyxeni Gkontra⁵, Elisa Rauseo¹, Aaron Mark Lee¹, Karim Lekadir^{5

6}, Petia Radeva⁷, Steffen E Petersen^{1

8

9

10}, Gloria Menegaz⁴

Affiliations

¹ William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK.
² Department of Population Health Sciences, University of Leicester, University Rd, Leicester, LE1 7RH UK.
³ Department of Computer Science, University of Zakho, Duhok road, Zakho, Kurdistan Iraq.
⁴ Department of Engineering for Innovative Medicine, University of Verona, S. Francesco, 22, 37129 Verona, Italy.
⁵ Artificial Intelligence in Medicine Lab (BCN-AIM), Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain.
⁶ Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona, Spain.
⁷ Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain.
⁸ Barts Heart Centre, St Bartholomew's Hospital, Barts Health NHS Trust, West Smithfield, London, UK.
⁹ Health Data Research, London, UK.
¹⁰ Alan Turing Institute, London, UK.

PMID: 39132011
PMCID: PMC11315784
DOI: 10.1007/s10462-024-10852-w

A review of evaluation approaches for explainable AI with applications in cardiology

Ahmed M Salih et al. Artif Intell Rev. 2024.

. 2024;57(9):240.

doi: 10.1007/s10462-024-10852-w. Epub 2024 Aug 9.

Authors

Ahmed M Salih^{1

2

3}, Ilaria Boscolo Galazzo⁴, Polyxeni Gkontra⁵, Elisa Rauseo¹, Aaron Mark Lee¹, Karim Lekadir^{5

6}, Petia Radeva⁷, Steffen E Petersen^{1

8

9

10}, Gloria Menegaz⁴

Affiliations

¹ William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK.
² Department of Population Health Sciences, University of Leicester, University Rd, Leicester, LE1 7RH UK.
³ Department of Computer Science, University of Zakho, Duhok road, Zakho, Kurdistan Iraq.
⁴ Department of Engineering for Innovative Medicine, University of Verona, S. Francesco, 22, 37129 Verona, Italy.
⁵ Artificial Intelligence in Medicine Lab (BCN-AIM), Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain.
⁶ Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona, Spain.
⁷ Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain.
⁸ Barts Heart Centre, St Bartholomew's Hospital, Barts Health NHS Trust, West Smithfield, London, UK.
⁹ Health Data Research, London, UK.
¹⁰ Alan Turing Institute, London, UK.

PMID: 39132011
PMCID: PMC11315784
DOI: 10.1007/s10462-024-10852-w

Abstract

Explainable artificial intelligence (XAI) elucidates the decision-making process of complex AI models and is important in building trust in model predictions. XAI explanations themselves require evaluation as to accuracy and reasonableness and in the context of use of the underlying AI model. This review details the evaluation of XAI in cardiac AI applications and has found that, of the studies examined, 37% evaluated XAI quality using literature results, 11% used clinicians as domain-experts, 11% used proxies or statistical analysis, with the remaining 43% not assessing the XAI used at all. We aim to inspire additional studies within healthcare, urging researchers not only to apply XAI methods but to systematically assess the resulting explanations, as a step towards developing trustworthy and safe models.

Supplementary information: The online version contains supplementary material available at 10.1007/s10462-024-10852-w.

Keywords: AI; Cardiac; Evaluation; XAI.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestThe authors declare that they have no Conflict of interest.

Figures

**Fig. 1**
General illustration. *MRI*:magnetic resonance imaging, *PDP* partial dependence plot, *ALE* accumulated local effects, *Grad-CAM* gradient-weighted class activation mapping, *LIME* local interpretable model-agnostic explanations, *SHAP* shapley additive explanations, *ROAR* RemOve And Retrain, *ERASER* evaluating rationales and simple English reasoning. Created with BioRender.com

**Fig. 2**
Workflow adhering to PRISMA guidelines, detailing the exclusion and inclusion criterion used in the search process, along with the final number of papers considered in the review

**Fig. 3**
Data modalities used in cardiac studies. A All cardiac studies, B cardiac studies applied proxy-grounded evaluation approaches, C cardiac studies applied expert-grounded evaluation approach, D cardiac studies applied literature-grounded evaluation approach, E cardiac studies did not apply any kind of evaluation to XAI outcomes. *ECG* electrocardiography, *EHR* electronic health records, *CMR* cardiac magnetic resonance imaging, CT computed tomography, EI electrocardiographic imaging, *PET* positron emission tomography, *MPI* myocardial perfusion imaging, *MCTP* myocardial computed tomography perfusion, HI histology images, SI scintigraphy images

**Fig. 4**
Distribution of the number of cardiac studies employing different XAI methods. *Grad-CAM* Gradient-weighted Class Activation Mapping, *LIME* Local Interpretable Model-agnostic Explanations, *SHAP* Shapley Additive Explanations

**Fig. 5**
The distribution of the diseases targeted in cardiac studies

**Fig. 6**
Distribution of the number of papers across four categories of XAI evaluation approaches: (i) literature-grounded, (ii) expert-grounded, (iii) proxy-grounded, (iv) none

**Fig. 7**
Matching the outcome of the evaluation with the outcome of XAI

**Fig. 8**
The number of the XAI methods used in cardiac applications. *Grad-CAM* Gradient-weighted Class Activation Mapping, *LIME* Local Interpretable Model-agnostic Explanations, *SHAP* Shapley Additive Explanations

**Fig. 9**
The number of the XAI methods used in cardiac applications. *Grad-CAM* Gradient-weighted Class Activation Mapping, *LIME* Local Interpretable Model-agnostic Explanations, *SHAP* Shapley Additive Explanations

See this image and copyright information in PMC

Cited by

Large Language Models in Genomics-A Perspective on Personalized Medicine.
Ali S, Qadri YA, Ahmad K, Lin Z, Leung MF, Kim SW, Vasilakos AV, Zhou T. Ali S, et al. Bioengineering (Basel). 2025 Apr 23;12(5):440. doi: 10.3390/bioengineering12050440. Bioengineering (Basel). 2025. PMID: 40428059 Free PMC article. Review.
Explainable AI in early autism detection: a literature review of interpretable machine learning approaches.
Agrawal R, Agrawal R. Agrawal R, et al. Discov Ment Health. 2025 Jul 1;5(1):98. doi: 10.1007/s44192-025-00232-3. Discov Ment Health. 2025. PMID: 40593180 Free PMC article. Review.
Transcatheter Aortic Valve Replacement in Bicuspid Aortic Valve Disease: A Review of the Existing Literature.
Alkhas C, Kidess GG, Brennan MT, Basit J, Yasmin F, Jaroudi W, Alraies MC. Alkhas C, et al. Cureus. 2025 Jan 29;17(1):e78192. doi: 10.7759/cureus.78192. eCollection 2025 Jan. Cureus. 2025. PMID: 40027070 Free PMC article. Review.
Artificial Intelligence in Ischemic Heart Disease Prevention.
Parsa S, Shah P, Doijad R, Rodriguez F. Parsa S, et al. Curr Cardiol Rep. 2025 Feb 1;27(1):44. doi: 10.1007/s11886-025-02203-0. Curr Cardiol Rep. 2025. PMID: 39891819 Review.
Explainable Artificial Intelligence in Radiological Cardiovascular Imaging-A Systematic Review.
Haupt M, Maurer MH, Thomas RP. Haupt M, et al. Diagnostics (Basel). 2025 May 31;15(11):1399. doi: 10.3390/diagnostics15111399. Diagnostics (Basel). 2025. PMID: 40506971 Free PMC article. Review.

See all "Cited by" articles

References

1. Aas K, Jullum M, Løland A (2021) Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. Artif Intell 298:103502
1. Abdullah TA, Zahid MSBM, Tang TB, Ali W, Nasser M (2022) Explainable deep learning model for cardiac arrhythmia classification. In: 2022 International conference on future trends in smart communities (ICFTSC). IEEE, pp 87–92
1. Abdullah TA, Zahid MSM, Ali W, Hassan SU (2023) B-LIME: an improvement of lime for interpretable deep learning classification of cardiac arrhythmia from ECG signals. Processes 11(2):595
1. Abraham VM, Booth G, Geiger P, Balazs GC, Goldman A (2022) Machine-learning models predict 30-day mortality, cardiovascular complications, and respiratory complications after aseptic revision total joint arthroplasty. Clin Orthop Relat Res 480(11):2137–2145 - PMC - PubMed
1. Agrawal A, Chauhan A, Shetty MK, Gupta MD, Gupta A et al (2022) ECG-iCOVIDNet: interpretable ai model to identify changes in the ecg signals of post-covid subjects. Comput Biol Med 146:105540 - PMC - PubMed

LinkOut - more resources

Full Text Sources
- PubMed Central

[1] Aas K, Jullum M, Løland A (2021) Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. Artif Intell 298:103502

[2] Aas K, Jullum M, Løland A (2021) Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. Artif Intell 298:103502

[3] Abdullah TA, Zahid MSBM, Tang TB, Ali W, Nasser M (2022) Explainable deep learning model for cardiac arrhythmia classification. In: 2022 International conference on future trends in smart communities (ICFTSC). IEEE, pp 87–92

[4] Abdullah TA, Zahid MSBM, Tang TB, Ali W, Nasser M (2022) Explainable deep learning model for cardiac arrhythmia classification. In: 2022 International conference on future trends in smart communities (ICFTSC). IEEE, pp 87–92

[5] Abdullah TA, Zahid MSM, Ali W, Hassan SU (2023) B-LIME: an improvement of lime for interpretable deep learning classification of cardiac arrhythmia from ECG signals. Processes 11(2):595

[6] Abdullah TA, Zahid MSM, Ali W, Hassan SU (2023) B-LIME: an improvement of lime for interpretable deep learning classification of cardiac arrhythmia from ECG signals. Processes 11(2):595

[7] Abraham VM, Booth G, Geiger P, Balazs GC, Goldman A (2022) Machine-learning models predict 30-day mortality, cardiovascular complications, and respiratory complications after aseptic revision total joint arthroplasty. Clin Orthop Relat Res 480(11):2137–2145 - PMC - PubMed

[8] Abraham VM, Booth G, Geiger P, Balazs GC, Goldman A (2022) Machine-learning models predict 30-day mortality, cardiovascular complications, and respiratory complications after aseptic revision total joint arthroplasty. Clin Orthop Relat Res 480(11):2137–2145 - PMC - PubMed

[9] Agrawal A, Chauhan A, Shetty MK, Gupta MD, Gupta A et al (2022) ECG-iCOVIDNet: interpretable ai model to identify changes in the ecg signals of post-covid subjects. Comput Biol Med 146:105540 - PMC - PubMed

[10] Agrawal A, Chauhan A, Shetty MK, Gupta MD, Gupta A et al (2022) ECG-iCOVIDNet: interpretable ai model to identify changes in the ecg signals of post-covid subjects. Comput Biol Med 146:105540 - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A review of evaluation approaches for explainable AI with applications in cardiology

Affiliations

A review of evaluation approaches for explainable AI with applications in cardiology

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources