Review

. 2022 Jan;49(1):1-14.

doi: 10.1002/mp.15359. Epub 2021 Dec 7.

A review of explainable and interpretable AI with applications in COVID-19 imaging

Jordan D Fuhrman^{1

2}, Naveena Gorre^{1

3}, Qiyuan Hu^{1

2}, Hui Li^{1

2}, Issam El Naqa^{1

3}, Maryellen L Giger^{1

2}

Affiliations

¹ Medical Imaging and Data Resource Center (MIDRC), The University of Chicago, Chicago, Illinois, USA.
² Department of Radiology, The University of Chicago, Chicago, Illinois, USA.
³ Department of Machine Learning, Moffitt Cancer Center, Tampa, Florida, USA.

PMID: 34796530
PMCID: PMC8646613
DOI: 10.1002/mp.15359

Review

A review of explainable and interpretable AI with applications in COVID-19 imaging

Jordan D Fuhrman et al. Med Phys. 2022 Jan.

. 2022 Jan;49(1):1-14.

doi: 10.1002/mp.15359. Epub 2021 Dec 7.

Authors

Jordan D Fuhrman^{1

2}, Naveena Gorre^{1

3}, Qiyuan Hu^{1

2}, Hui Li^{1

2}, Issam El Naqa^{1

3}, Maryellen L Giger^{1

2}

Affiliations

¹ Medical Imaging and Data Resource Center (MIDRC), The University of Chicago, Chicago, Illinois, USA.
² Department of Radiology, The University of Chicago, Chicago, Illinois, USA.
³ Department of Machine Learning, Moffitt Cancer Center, Tampa, Florida, USA.

PMID: 34796530
PMCID: PMC8646613
DOI: 10.1002/mp.15359

Abstract

The development of medical imaging artificial intelligence (AI) systems for evaluating COVID-19 patients has demonstrated potential for improving clinical decision making and assessing patient outcomes during the recent COVID-19 pandemic. These have been applied to many medical imaging tasks, including disease diagnosis and patient prognosis, as well as augmented other clinical measurements to better inform treatment decisions. Because these systems are used in life-or-death decisions, clinical implementation relies on user trust in the AI output. This has caused many developers to utilize explainability techniques in an attempt to help a user understand when an AI algorithm is likely to succeed as well as which cases may be problematic for automatic assessment, thus increasing the potential for rapid clinical translation. AI application to COVID-19 has been marred with controversy recently. This review discusses several aspects of explainable and interpretable AI as it pertains to the evaluation of COVID-19 disease and it can restore trust in AI application to this disease. This includes the identification of common tasks that are relevant to explainable medical imaging AI, an overview of several modern approaches for producing explainable output as appropriate for a given imaging scenario, a discussion of how to evaluate explainable AI, and recommendations for best practices in explainable/interpretable AI implementation. This review will allow developers of AI systems for COVID-19 to quickly understand the basics of several explainable AI techniques and assist in the selection of an approach that is both appropriate and effective for a given scenario.

Keywords: AI; COVID-19; deep learning; explainability; interpretability.

PubMed Disclaimer

Conflict of interest statement

IEN has served as deputy editor of Medical Physics and discloses a relationship with Scientific Advisory Endectra, LLC. MLG is a stockholder and receives royalties, Hologic, Inc.; equity holder and co‐founder, Quantitative Insights, Inc (now Qlarity Imaging); shareholder, QView Medical, Inc.; royalties, General Electric Company, MEDIAN Technologies, Riverain Technologies, LLC, Mitsubishi, and Toshiba.

Figures

**FIGURE 1**
Examples of questions regarding “explainability” and “interpretability” as defined in this review. While the two imply similar meaning, the intended audience and implementation of the model output is different between the two

**FIGURE 2**
Portrayal of the tradeoff between learning performance, which is often associated with the number of learned parameters, and explainability. Note that deep networks are among the most common techniques for ML‐based medical image evaluation, but also have generally low interpretability. There has been a strong push in recent years to develop techniques for general explanation of neural network predictions. Image acquired from Gunning (publicly available presentation with open distribution)

**FIGURE 3**
(a) Example of embeddings for a diagnosis task. Red points show positive embeddings, while green and black points show negative embeddings. The ovals depict distributions estimated from the points, with the different ovals referencing different presentations of disease. For example, presentations of COVID‐19 on CT images include ground glass opacities, crazy paving, and architectural distortions. The black oval indicates a positive presentation that significantly overlaps with the negative class distribution, indicating that these cases are problematic and may need a larger prevalence in the training set. (b) Example of embeddings for prognostic clinical evaluation. Green, yellow, and red points refer to healthy/mild, intermediate, and severe disease stages with corresponding distributions depicted through ovals. Clinically, this could be used to help identify in which stage a patient lies and appropriately guide clinical decisions

**FIGURE 4**
Example heatmaps obtained from a variety of techniques (top) and from Grad‐CAM (bottom). Each technique may provide different evaluations of influential regions, both in terms of relative importance and key locations. Further, note that Grad‐CAM may identify regions that are not important to a human observer. Regions outside the lungs were identified with relatively high influence for the network classification even though a radiologist likely would not use this information in COVID‐19 diagnosis. In the examples by Hu (bottom), some of the Grad‐CAM examples demonstrate reasonable heatmap localization (a–c), while others are less intuitive (d, e). Acquired with permission

**FIGURE 5**
(a) Filter, (b) wrapper, and (c) embedded feature selection methods. Filter methods perform the feature selection independently of construction of the classification model. Wrapper methods iteratively select or eliminate a set of features using the prediction accuracy of the classification model. In embedded methods, the feature selection is an integral part of the classification model. Obtained with permission under MDPI Open Access Policy

**FIGURE 6**
Shapley values acquired for classification of several example images. Note that this technique can identify both positively and negatively influential pixels. In general, these examples follow expectations with peripheral, lower lobe features providing generally more influence than other regions of the lung for COVID‐19 diagnosis

See this image and copyright information in PMC

References

1. Deo Rahul C. Machine learning in medicine. Circulation. 2015;132(20):1920‐1930. - PMC - PubMed
1. Leung MKK, Delong A, Alipanahi B, Frey BJ. Machine learning in genomic medicine: a review of computational problems and data sets. Proc IEEE. 2016;104(1):176‐197.
1. Fatima M, Pasha M. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;09(01):1.
1. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347‐1358. - PubMed
1. Yanase J, Triantaphyllou E. A systematic survey of computer‐aided diagnosis in medicine: past and present developments. Expert Syst Appl. 2019;138:112821.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A review of explainable and interpretable AI with applications in COVID-19 imaging

Affiliations

A review of explainable and interpretable AI with applications in COVID-19 imaging

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical