A to-do list for realizing the sequence-to-function paradigm of proteins
- PMID: 40680328
- DOI: 10.1016/j.sbi.2025.103119
A to-do list for realizing the sequence-to-function paradigm of proteins
Abstract
It has been a longstanding dream of the structural biology and molecular biophysics communities to determine protein functions directly from the amino acid sequences. Most methods available today, however, are homology- or library-based and often undermine determining divergent functions from comparable sequences or vice versa. The sequence-to-function relationship is intrinsically dependent on the biophysical space of protein dynamics, which can be potentially exploited to annotate function. But, despite three decades of active research, the space of molecular dynamics data remains grossly underpopulated. By employing surveys of the existing literature, we highlight this gray area in the context of machine learning methods. Thereafter, we share examples that point toward learning biophysical representations-or signatures-and combining them with integrative models as means to robustly associate sequence with function. The aim is to avoid having to compute protein dynamics for an impossible thousand years to achieve data completeness and generalization.
Copyright © 2025 Elsevier Ltd. All rights reserved.
Conflict of interest statement
Declaration of competing interest None.