Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Jan;49(1):1-14.
doi: 10.1002/mp.15359. Epub 2021 Dec 7.

A review of explainable and interpretable AI with applications in COVID-19 imaging

Affiliations
Review

A review of explainable and interpretable AI with applications in COVID-19 imaging

Jordan D Fuhrman et al. Med Phys. 2022 Jan.

Abstract

The development of medical imaging artificial intelligence (AI) systems for evaluating COVID-19 patients has demonstrated potential for improving clinical decision making and assessing patient outcomes during the recent COVID-19 pandemic. These have been applied to many medical imaging tasks, including disease diagnosis and patient prognosis, as well as augmented other clinical measurements to better inform treatment decisions. Because these systems are used in life-or-death decisions, clinical implementation relies on user trust in the AI output. This has caused many developers to utilize explainability techniques in an attempt to help a user understand when an AI algorithm is likely to succeed as well as which cases may be problematic for automatic assessment, thus increasing the potential for rapid clinical translation. AI application to COVID-19 has been marred with controversy recently. This review discusses several aspects of explainable and interpretable AI as it pertains to the evaluation of COVID-19 disease and it can restore trust in AI application to this disease. This includes the identification of common tasks that are relevant to explainable medical imaging AI, an overview of several modern approaches for producing explainable output as appropriate for a given imaging scenario, a discussion of how to evaluate explainable AI, and recommendations for best practices in explainable/interpretable AI implementation. This review will allow developers of AI systems for COVID-19 to quickly understand the basics of several explainable AI techniques and assist in the selection of an approach that is both appropriate and effective for a given scenario.

Keywords: AI; COVID-19; deep learning; explainability; interpretability.

PubMed Disclaimer

Conflict of interest statement

IEN has served as deputy editor of Medical Physics and discloses a relationship with Scientific Advisory Endectra, LLC. MLG is a stockholder and receives royalties, Hologic, Inc.; equity holder and co‐founder, Quantitative Insights, Inc (now Qlarity Imaging); shareholder, QView Medical, Inc.; royalties, General Electric Company, MEDIAN Technologies, Riverain Technologies, LLC, Mitsubishi, and Toshiba.

Figures

FIGURE 1
FIGURE 1
Examples of questions regarding “explainability” and “interpretability” as defined in this review. While the two imply similar meaning, the intended audience and implementation of the model output is different between the two
FIGURE 2
FIGURE 2
Portrayal of the tradeoff between learning performance, which is often associated with the number of learned parameters, and explainability. Note that deep networks are among the most common techniques for ML‐based medical image evaluation, but also have generally low interpretability. There has been a strong push in recent years to develop techniques for general explanation of neural network predictions. Image acquired from Gunning (publicly available presentation with open distribution)
FIGURE 3
FIGURE 3
(a) Example of embeddings for a diagnosis task. Red points show positive embeddings, while green and black points show negative embeddings. The ovals depict distributions estimated from the points, with the different ovals referencing different presentations of disease. For example, presentations of COVID‐19 on CT images include ground glass opacities, crazy paving, and architectural distortions. The black oval indicates a positive presentation that significantly overlaps with the negative class distribution, indicating that these cases are problematic and may need a larger prevalence in the training set. (b) Example of embeddings for prognostic clinical evaluation. Green, yellow, and red points refer to healthy/mild, intermediate, and severe disease stages with corresponding distributions depicted through ovals. Clinically, this could be used to help identify in which stage a patient lies and appropriately guide clinical decisions
FIGURE 4
FIGURE 4
Example heatmaps obtained from a variety of techniques (top) and from Grad‐CAM (bottom). Each technique may provide different evaluations of influential regions, both in terms of relative importance and key locations. Further, note that Grad‐CAM may identify regions that are not important to a human observer. Regions outside the lungs were identified with relatively high influence for the network classification even though a radiologist likely would not use this information in COVID‐19 diagnosis. In the examples by Hu (bottom), some of the Grad‐CAM examples demonstrate reasonable heatmap localization (a–c), while others are less intuitive (d, e). Acquired with permission
FIGURE 5
FIGURE 5
(a) Filter, (b) wrapper, and (c) embedded feature selection methods. Filter methods perform the feature selection independently of construction of the classification model. Wrapper methods iteratively select or eliminate a set of features using the prediction accuracy of the classification model. In embedded methods, the feature selection is an integral part of the classification model. Obtained with permission under MDPI Open Access Policy
FIGURE 6
FIGURE 6
Shapley values acquired for classification of several example images. Note that this technique can identify both positively and negatively influential pixels. In general, these examples follow expectations with peripheral, lower lobe features providing generally more influence than other regions of the lung for COVID‐19 diagnosis

Similar articles

Cited by

References

    1. Deo Rahul C. Machine learning in medicine. Circulation. 2015;132(20):1920‐1930. - PMC - PubMed
    1. Leung MKK, Delong A, Alipanahi B, Frey BJ. Machine learning in genomic medicine: a review of computational problems and data sets. Proc IEEE. 2016;104(1):176‐197.
    1. Fatima M, Pasha M. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;09(01):1.
    1. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347‐1358. - PubMed
    1. Yanase J, Triantaphyllou E. A systematic survey of computer‐aided diagnosis in medicine: past and present developments. Expert Syst Appl. 2019;138:112821.