Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug;21(8):1454-1461.
doi: 10.1038/s41592-024-02359-7. Epub 2024 Aug 9.

Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments

Affiliations
Review

Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments

Valerie Chen et al. Nat Methods. 2024 Aug.

Abstract

Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers.

PubMed Disclaimer

Conflict of interest statement

A.T. received gift research grants from Meta, Morgan Stanley, and Amazon. J.M. received gift research grant from Google Research. A.T. works part-time for Amplify Partners. The other authors declare no competing interests.

Figures

Figure 1:
Figure 1:
The two main IML approaches used to explain prediction models are post-hoc explanations and by-design explanations. Each approach has its canonical workflows and popular types of IML methods: post-hoc explanations are model-agnostic and are applied after a model is trained while by-design explanations are typically built into or inherent to the model architecture.
Figure 2:
Figure 2:
How do we assess explanations, which attribute importance scores to features of an input, generated by an IML method? IML methods are typically evaluated for the faithfulness of their computed feature importance scores as compared to a known ground truth mechanism and the stability of computed feature importance scores (e.g., as denoted by error bars) across varied input data.
Figure 3:
Figure 3:
An overview of three common pitfalls of IML interpretation in biological contexts and how to avoid these pitfalls. 1) Only considering one IML method. Consideration of multiple IML methods can inform the downstream interpretation of the outputs. 2) IML output disconnected from biological interpretation. Oftentimes, a post-processing step is necessary to enable interpretation of the IML output, particularly when the method is applied to sequence or pixel-level data. 3) Cherry-picked presentation of results. Many prior works do not present a complete picture of the extent to which the IML output reflects known biological mechanisms.

References

    1. Miller T Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267, 1–38 (2019).
    1. Doshi-Velez F & Kim B Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).
    1. Azodi CB, Tang J & Shiu S-H Opening the black box: Interpretable machine learning for geneticists. Trends in Genetics 36, 442–455 (2020). - PubMed
    1. Eraslan G, Avsec Ž, Gagneur J & Theis FJ Deep learning: new computational modelling techniques for genomics. Nature Reviews Genetics 20, 389–403 (2019). This paper gives an extensive review of the application of deep learning models in genomics. - PubMed
    1. Talukder A, Barham C, Li X & Hu H Interpretation of deep learning in genomics and epigenomics. Briefings in Bioinformatics 22, bbaa177 (2021). - PMC - PubMed

LinkOut - more resources