Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks

Yichuan Zhang¹, Yongfang Qin¹, Mahdi Pourmirzaei¹, Qing Shao², Duolin Wang³, Dong Xu⁴

Affiliations

¹ Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
² Chemical & Materials Engineering, University of Kentucky, Lexington, KY, USA.
³ Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA. wangdu@missouri.edu.
⁴ Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA. xudong@missouri.edu.

PMID: 40601249
DOI: 10.1007/978-1-0716-4623-6_2

Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks

Yichuan Zhang et al. Methods Mol Biol. 2025.

. 2025:2941:31-58.

doi: 10.1007/978-1-0716-4623-6_2.

Authors

Yichuan Zhang¹, Yongfang Qin¹, Mahdi Pourmirzaei¹, Qing Shao², Duolin Wang³, Dong Xu⁴

Affiliations

¹ Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
² Chemical & Materials Engineering, University of Kentucky, Lexington, KY, USA.
³ Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA. wangdu@missouri.edu.
⁴ Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA. xudong@missouri.edu.

PMID: 40601249
DOI: 10.1007/978-1-0716-4623-6_2

Abstract

Proteins are crucial in a wide range of biological and engineering processes. Large protein language models (PLMs) can significantly advance our understanding and engineering of proteins. However, the effectiveness of PLMs in prediction and design is largely based on the representations derived from protein sequences. Without incorporating the three-dimensional (3D) structures of proteins, PLMs would overlook crucial aspects of how proteins interact with other molecules, thereby limiting their predictive accuracy. To address this issue, we present S-PLM, a 3D structure-aware PLM, that employs multi-view contrastive learning to align protein sequences with their 3D structures in a unified latent space. Previously, we utilized a contact map-based approach to encode structural information, applying the Swin-Transformer to contact maps derived from AlphaFold-predicted protein structures. This work introduces a new approach that leverages a geometric vector perceptron (GVP) model to process 3D coordinates and obtain structural embeddings. We focus on the application of structure-aware models for protein-related tasks by utilizing efficient fine-tuning methods to achieve optimal performance without significant computational costs. Our results show that S-PLM outperforms sequence-only PLMs across all protein clustering and classification tasks, achieving performance on par with state-of-the-art methods that require both sequence and structure inputs. S-PLM and its tuning tools are available at https://github.com/duolinwang/S-PLM/ .

Keywords: 3D structure-aware PLM; ESM 2; Efficient fine-tuning; Large protein language models (PLMs); Multi-view contrastive learning.

PubMed Disclaimer

References

1. Ofer D, Brandes N, Linial M (2021) The language of proteins: NLP, machine learning & protein sequences. Comput Struct Biotechnol J 19:1750–1758. https://doi.org/10.1016/j.csbj.2021.03.022 - DOI - PubMed - PMC
1. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Accessed: Sep. 27, 2024. [Online]. Available: http://arxiv.org/abs/1810.04805
1. Elnaggar A et al (2022) ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell 44(10):7112–7127. https://doi.org/10.1109/TPAMI.2021.3095381 - DOI - PubMed
1. Lin Z et al (2023) Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379(6637):1123–1130. https://doi.org/10.1126/science.ade2574 - DOI - PubMed
1. Rives A et al (2021) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci USA 118(15):e2016239118. https://doi.org/10.1073/pnas.2016239118 - DOI - PubMed - PMC

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Springer

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks

Affiliations

Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks

Authors

Affiliations

Abstract

Similar articles

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Similar articles

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources