This is a preprint.
Learning Biophysical Dynamics with Protein Language Models
- PMID: 39464109
- PMCID: PMC11507661
- DOI: 10.1101/2024.10.11.617911
Learning Biophysical Dynamics with Protein Language Models
Abstract
Structural dynamics are fundamental to protein functions and mutation effects. Current protein deep learning models are predominantly trained on sequence and/or static structure data, which often fail to capture the dynamic nature of proteins. To address this, we introduce SeqDance and ESMDance, two protein language models trained on dynamic biophysical properties derived from molecular dynamics simulations and normal mode analyses of over 64,000 proteins. SeqDance, trained from scratch, learns both local dynamic interactions and global conformational properties for ordered and disordered proteins. SeqDance predicted dynamic property changes reflect mutation effect on protein folding stability. ESMDance, built upon ESM2 outputs, substantially outperforms ESM2 in zero-shot prediction of mutation effects for designed and viral proteins which lack evolutionary information. Together, SeqDance and ESMDance offer a new framework for integrating protein dynamics into language models, enabling more generalizable predictions of protein behavior and mutation effects.
Keywords: molecular dynamics; mutation effects; normal mode analysis; protein language model.
Conflict of interest statement
Competing interests The authors declare no competing interests.
Figures





References
-
- Lin Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). - PubMed
-
- Kulmanov M. et al. Protein function prediction as approximate semantic entailment. Nature Machine Intelligence 6, 220–228 (2024).