Accurately predicting optimal conditions for microorganism proteins through geometric graph learning and language model
- PMID: 39739114
- PMCID: PMC11683147
- DOI: 10.1038/s42003-024-07436-3
Accurately predicting optimal conditions for microorganism proteins through geometric graph learning and language model
Abstract
Proteins derived from microorganisms that survive in the harshest environments on Earth have stable activity under extreme conditions, providing rich resources for industrial applications and enzyme engineering. Due to the time-consuming nature of experimental determinations, it is imperative to develop computational models for fast and accurate prediction of protein optimal conditions. Previous studies were limited by the scarcity of data and the neglect of protein structures. To solve these problems, we constructed an up-to-date dataset with 175,905 non-redundant proteins and proposed a new model GeoPoc based on geometric graph learning for the protein optimal temperature, pH, and salt concentration prediction. GeoPoc leverages protein structures and sequence embeddings extracted from pre-trained language model, and further employs a geometric graph transformer network to capture the sequence and spatial information. We first focused on in-house validation for optimal temperature prediction for robustness assessment, and achieved a PCC of 0.78. The algorithm is further confirmed in an independent test set, where GeoPoc surpasses the state-of-the-art method by 2.3% in AUC. Additionally, GeoPoc was extended to pH and salt concentration prediction, and obtained AUC scores of 0.78 and 0.77, respectively. Through further interpretable analysis, GeoPoc elucidates the critical physicochemical properties that contribute to enhancing protein thermostability.
© 2024. The Author(s).
Conflict of interest statement
Competing interests: Y.Y. is an Editorial Board Member for Communications Biology, but was not involved in the editorial review of, nor the decision to publish this article. All the other authors declare no competing interests.
Figures
References
-
- Stetter, K. O. Extremophiles and their adaptation to hot environments. FEBS Lett.452, 22–25 (1999). - PubMed
-
- Dumorné, K., Córdova, D. C., Astorga-Eló, M. & Renganathan, P. Extremozymes: a potential source for industrial applications J. Microbiol. Biothechnol. 27, 649–659 (2017). - PubMed
-
- Cowan, D. A., Ramond, J.-B., Makhalanyane, T. P. & De Maayer, P. Metagenomics of extreme environments. Curr. Opin. Microbiol.25, 97–102 (2015). - PubMed
-
- Fujiwara, S. Extremophiles: Developments of their special functions and potential resources. J. Biosci. Bioeng.94, 518–525 (2002). - PubMed
-
- Brininger, C., Spradlin, S., Cobani, L. & Evilia, C. The more adaptive to change, the more likely you are to survive: protein adaptation in extremophiles. In Seminars In Cell & Developmental Biology (ed. Mao, Y.) 158–169 (Elsevier, 2018). - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
