Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 May 13;3(5):100493.
doi: 10.1016/j.patter.2022.100493.

Human-centered explainability for life sciences, healthcare, and medical informatics

Affiliations
Review

Human-centered explainability for life sciences, healthcare, and medical informatics

Sanjoy Dey et al. Patterns (N Y). .

Abstract

Rapid advances in artificial intelligence (AI) and availability of biological, medical, and healthcare data have enabled the development of a wide variety of models. Significant success has been achieved in a wide range of fields, such as genomics, protein folding, disease diagnosis, imaging, and clinical tasks. Although widely used, the inherent opacity of deep AI models has brought criticism from the research field and little adoption in clinical practice. Concurrently, there has been a significant amount of research focused on making such methods more interpretable, reviewed here, but inherent critiques of such explainability in AI (XAI), its requirements, and concerns with fairness/robustness have hampered their real-world adoption. We here discuss how user-driven XAI can be made more useful for different healthcare stakeholders through the definition of three key personas-data scientists, clinical researchers, and clinicians-and present an overview of how different XAI approaches can address their needs. For illustration, we also walk through several research and clinical examples that take advantage of XAI open-source tools, including those that help enhance the explanation of the results through visualization. This perspective thus aims to provide a guidance tool for developing explainability solutions for healthcare by empowering both subject matter experts, providing them with a survey of available tools, and explainability developers, by providing examples of how such methods can influence in practice adoption of solutions.

Keywords: artificial intelligence; clinical research; explainability; healthcare; life sciences; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Taxonomy tree for explainability in AI models To figure out the most appropriate explanation method, we propose a taxonomy of questions represented as a decision tree to help navigate the process. The green leaf nodes represent algorithms that are in the current release of AI Explainability 360. Considering the data, different choices are possible relative to its representation and understanding: data understanding based on features, in which case theory can yield disentangled representations, such as in Disentangled Inferred Prior Variational AutoEncoders (DIP-VAEs); otherwise, a sample-based approach using ProtoDash is possible, which provides a way to do case-based reasoning. If the goal is to explain models instead of data, then the next question is whether a local explanation for individual samples or a global explanation for the entire model is needed. Following the global path, the next question is, Should it be a post hoc method or a self-explaining one? On the self-explaining branch, TED (teaching explanations for decision making) is one option, or a global method, such as BRCG (Boolean rule sets with column generation). On the model agnostic post hoc branch, again, explaining in terms of samples or features comes up. On the sample side, prototypes come up again, as on the feature side choices among the contrast of explanations methods (CEMs), as well as popular algorithms, such as LIME or SHAP, are available. Finally on the post hoc global side, surrogate models, such as ProfWeight, are available. On the model-specific branch, one has to choose between modifying models, surrogate models, or simply visualizations. Going back up, aiming for global explanations for the entire model, then the question again is whether something post hoc is needed or a directly interpretable model? A directly interpretable model could be a Boolean rule set, such as BRCG or GLRMs (generalized linear rule models), can yield the answer.
Figure 2
Figure 2
Health XAI Persona continuum and roles (Top) Different personas relevant for user-centric XAI models and their domain knowledge and XAI roles. (Bottom) Example of specific roles of three personas for a real-world problem of designing an explainable progression model for chronic diseases.
Figure 3
Figure 3
Applications of four popular XAI methods (A) LIME optimizes the local faithfulness and complexity of explanation. It has two versions to find both local and global explanations, which we applied on a COVID-19 longitudinal dataset to represent the clinico-genomic factors associated with COVID-19 severity. Local/global importance of single nucleotide polymorphisms, indicated by their chromosomic location and clinical variables relative to the patient outcome, are shown in green or red for positive or negative association, respectively. (B) SHAP is a more generalized version of six linear-based explainable models using Shapely regression values. The Shapely regression values are applied on a type 2 diabetes longitudinal dataset consisting of electronic health records (EHRs); red dots represent variables negatively influencing and blue dots positively influencing the outcome as shown by the value of the SHAP value. (C) Contrastive explanation method (CEM) finds the pertinent positive and negative samples that are minimally and sufficiently present and absent for that class, respectively. (Bottom) Shows the pertinent positives and negative regions of interest (ROIs) of the brain related to an fMRI imaging dataset used to differentiate between autistic (A) and neurotypical subjects (T), represented in the columns of the matrices. The raw imaging features were summarized into seven brain regions, represented by the rows in the matrices. Blue hue represents importance of the regions when using the LRB algorithm (left matrix) or CEM (right matrix); see Dhurandhar et al. (D) ProtoDash tries to find prototypes samples by summarizing its underlying distribution, which was applied to order the importance of 19 olfactory descriptors used to predict the odor of pure molecules as described by 131 descriptors. Note the descriptors order does not change when using only word embeddings for prediction (ImpSem) or psychophysical olfactory measurements (DirSem). For the equation describing each of these methods, see Figure S1B.
Figure 4
Figure 4
Knowledge transfer for explainability (A) Scheme of transfer learning from a complex model to the right to a simpler one to the left, where a globally explainable model was constructed from local explanations. (B) (Left) Example of a neural network with two hidden layers and the associated probes, i.e., linear classifiers. To the right are shown (top) an easy example of a number 7 from the MNIST (Modified National Institute of Standards and Technology) dataset to be classified and a hard example on the bottom, with their associated area under the curve (AUC), approximated by taking averages of the classification performance of the two probes. The probe output is an indication of how easy or hard that example is to classify. The easy example obtains good classification from the first layer, but the hard example, as 7, is not well written; it is very hard for the neural network to classify it correctly. Only when reaching higher-level probes, essentially a deep neural network, the performance is high. (Right) The AUC is then used to weight the loss function of a simple model and retrain it. If the simple models do not use weighted loss functions for training, the training set can be re-sampled according to their difficulty following re-weighted ratios.
Figure 5
Figure 5
Example of what-if analysis tools (Left) RETAINVis RNN ”RETAIN” model, showing the contribution to the overall outcome of patient visits through feature contribution score, representing drugs (violet), diagnosis (yellow), or physiological markers (green) for each visit. (Bottom) Patient list shows individual patients in a row of rectangles. In the patient list, users can select a patient of interest to view details, shown below, and edit patients to conduct a what-if analysis. (Right top) Dimensionality reduction techniques like t-SNE (t-distributed stochastic neighbor embedding) result in the blue scatterplot to gain an overview and then build patient cohorts using the lasso selection tools and take a look at the distribution for demographic information like biological sex, age, and risk prediction scores (red circle). (Right bottom) Contribution scores for each visit and patient details are shown after the updated results of the what-if analysis. In the middle, an area chart shows aggregated contribution scores of nine medical codes over time. It shows mean and standard deviation as an area. Users can also see the medical codes and their mean contribution scores in bar chart.
Figure 6
Figure 6
Visualization and explanations for three types of users (A) ProtoDash and CEM Explorer allow data scientists to inspect the trained model using the contrastive explanation method. The examples shown are related to an RNN model predicting in patient re-admission risk based on previous emergency room (ER) visits and variables extracted from insurance claims, such as hospital-acquired condition, vascular-catheter-associated infection, length of stay, number of diagnosis on claims, and number of prior ER visits. Each representative patient (yellow line and yellow dot) is extracted using ProtoDash, and then CEM is used to obtain explanations, as represented by the different colors: red box, inpatient; yellow box, outpatient; and green box, skilled nursing facility (SNF). (B) DPVis helps clinical researchers to understand the disease progression patterns by interacting with multiple, coordinated visualizations. (Top) Represents a diagram of a hidden Markov model (HMM) and the used/unused variables to find disease progression states using different visits over time. The HMM extracts the most probable sequence of states for a specific patient. (Bottom) The waterfall view shows the state progression patterns and time/age for each patient (represented by a line) over time as well as the age distribution at diagnosis for all the cohort (red) and the selected cohort (yellow). Overlap is shown in orange. (C) RetainVis can help clinicians test how (top) an RNN-based model performs on a set of patients by conducting various what-if analyses. (Middle) Single-patient view of the feature contribution score, representing drugs (violet), diagnosis (yellow), or physiological markers (green) for each visit in the treatment pathway. (Bottom) Questions can be answered by editing patient visits, because medical records and update timestamps can be modified for each visit obtaining new predictions and contributions over patients visits by re-running the model. Contribution scores show how much each medical code and visit affects the prediction score at the end. Top contribution scores can be also generated per patient and for multiple patients by aggregating the scores.

References

    1. Meyer P., Saez-Rodriguez J. Advances in systems biology modeling: 10 years of crowdsourcing dream challenges. Cell Syst. 2021;12:636–653. - PubMed
    1. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., et al. Highly accurate protein structure prediction with alphafold. Nature. 2021;596:583–589. - PMC - PubMed
    1. Ching T., Himmelstein D.S., Beaulieu-Jones B.K., Kalinin A.A., Do B.T., Way G.P., Ferrero E., Agapow P.M., Zietz M., Hoffman M.M., et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface. 2018;15:20170387. - PMC - PubMed
    1. Stephenson N., Shane E., Chase J., Rowland J., Ries D., Justice N., Zhang J., Chan L., Cao R. Survey of machine learning techniques in drug discovery. Curr. Drug Metab. 2019;20:185–193. - PubMed
    1. Erickson B.J., Korfiatis P., Akkus Z., Kline T.L. Machine learning for medical imaging. Radiographics. 2017;37:505–515. - PMC - PubMed

LinkOut - more resources