Harnessing deep learning for proteome-scale detection of amyloid signaling motifs
- PMID: 40662825
- PMCID: PMC12261475
- DOI: 10.1093/bioinformatics/btaf200
Harnessing deep learning for proteome-scale detection of amyloid signaling motifs
Abstract
Motivation: Amyloid signaling sequences adopt the cross-β fold that is capable of self-replication in the templating process. Propagation of the amyloid fold from the receptor to the effector protein is used for signal transduction in the immune response pathways in animals, fungi, and bacteria. So far, a dozen of families of amyloid signaling motifs (ASMs) have been classified. Unfortunately, due to the wide variety of ASMs it is difficult to identify them in large protein databases available, which limits the possibility of conducting experimental studies. To date, various deep learning (DL) models have been applied across a range of protein-related tasks, including domain family classification and the prediction of protein structure and protein-protein interactions.
Results: In this study, we develop tailor-made bidirectional LSTM and BERT-based architectures to model ASM, and compare their performance against a state-of-the-art machine learning grammatical model. Our research is focused on developing a discriminative model of generalized ASMs, capable of detecting ASMs in large datasets. The DL-based models are trained on a diverse set of motif families and a global negative set, and used to identify ASMs from remotely related families. We analyze how both models represent the data and demonstrate that the DL-based approaches effectively detect ASMs, including novel motifs, even at the genome scale.
Availability and implementation: The models are provided as a Python package, asmscan-bilstm, and a Docker image at https://github.com/chrispysz/asmscan-proteinbert-run. The source code can be accessed at https://github.com/jakub-galazka/asmscan-bilstm and https://github.com/chrispysz/asmscan-proteinbert. Data and results are at https://github.com/wdyrka-pwr/ASMscan.
© The Author(s) 2025. Published by Oxford University Press.
Figures






References
-
- Bileschi ML, Belanger D, Bryant DH et al. Using deep learning to annotate the protein universe. Nat Biotechnol 2022;40:932–7. - PubMed
-
- Booth TL. Probabilistic representation of formal languages. In: Proceedings of the 10th Annual Symposium on Switching and Automata Theory. 1969, 74–81. New York, NY, USA: IEEE.
-
- Booth TL, Thompson RA. Applying probability measures to abstract languages. IEEE Trans Comput 1973;C-22:442–50.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources