A protein language model for exploring viral fitness landscapes
- PMID: 40360496
- PMCID: PMC12075601
- DOI: 10.1038/s41467-025-59422-w
A protein language model for exploring viral fitness landscapes
Abstract
Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype-fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Here, we present CoVFit, a protein language model adapted from ESM-2, designed to predict variant fitness based solely on spike protein sequences. CoVFit was trained on genotype-fitness data derived from viral genome surveillance and functional mutation assays related to immune evasion. CoVFit successively ranked the fitness of unknown future variants harboring nearly 15 mutations with informative accuracy. CoVFit identified 959 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, we show that CoVFit is applicable for predicting viral evolution through single amino acid mutations. Our study gives insight into the SARS-CoV-2 fitness landscape and provides a tool for efficiently identifying SARS-CoV-2 variants with higher epidemic risk.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: J.I. has consulting fees and honoraria for lectures from Takeda Pharmaceutical Co. Ltd Spyros Lytras has consulting fees from EcoHealth Alliance. K.S. has consulting fees from Moderna Japan Co., Ltd and Takeda Pharmaceutical Co. Ltd, and honoraria for lectures from Gilead Sciences, Inc., Moderna Japan Co., Ltd, and Shionogi & Co., Ltd. The other authors declare no competing interests.
Figures






References
-
- Markov, P. V. et al. The evolution of SARS-CoV-2. Nat. Rev. Microbiol.21, 361–379 (2023). - PubMed
MeSH terms
Substances
Supplementary concepts
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous