SRPM-Sol: A Structure Robust Protein Multimodal Model for Solubility Prediction
- PMID: 40811309
- DOI: 10.1109/TCBBIO.2025.3569286
SRPM-Sol: A Structure Robust Protein Multimodal Model for Solubility Prediction
Abstract
The solubility of natural proteins is closely linked to their expression and purification processes. Accurate computational prediction of protein solubility not only aids in functional assessment but also reduces the cost of preliminary wet-lab experiments. The current mainstream deep learning prediction methods have begun to explore the multimodal framework. However, existing multimodal models mainly focus on sequences and structures, overlooking other influential factors. Additionally, inherent errors in predicted structure information pose a significant challenge to model robustness. To solve the above issues, we introduce SRPM-Sol, a novel multimodal protein solubility prediction model. Built upon the state-of-the-art ESM3 model, this framework combines amino acid sequences, structure information, secondary structure sequences, and physicochemical properties for more accurate prediction. This is the most diverse-input model to date in the protein multimodal field for solubility prediction. In order to verify the effectiveness of our method, we have constructed the first hierarchical dataset, PDE-Sol, by organizing data based on Predicted Local Distance Difference Test (pLDDT) scores. The experimental results demonstrate that compared to the baselines, SRPM-Sol achieves stronger robustness and higher accuracy on different levels of PDE-Sol, even in the presence of uncertain structure information.
LinkOut - more resources
Miscellaneous