This is a preprint.
An intrinsically interpretable neural network architecture for sequence to function learning
- PMID: 36747873
- PMCID: PMC9900791
- DOI: 10.1101/2023.01.25.525572
An intrinsically interpretable neural network architecture for sequence to function learning
Update in
-
An intrinsically interpretable neural network architecture for sequence-to-function learning.Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i413-i422. doi: 10.1093/bioinformatics/btad271. Bioinformatics. 2023. PMID: 37387140 Free PMC article.
Abstract
Motivation: Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called tiSFM (totally interpretable sequence to function model). tiSFM improves upon the performance of standard multi-layer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multi-layer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs.
Results: We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition.
Availability and implementation: The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.
Figures





References
-
- Alipanahi B. et al. (2015). Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol., 33, 831–838. - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources