ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction
- PMID: 40205854
- PMCID: PMC11982017
- DOI: 10.1093/bib/bbaf156
ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction
Abstract
In the realm of protein design, the efficient construction of protein sequences that accurately fold into predefined structures has become an important area of research. Although advancements have been made in the study of long-chain proteins, the design of short-chain proteins requires equal consideration. The structural information inherent in short and single chains is typically less comprehensive than that of full-length chains, which can negatively impact their performance. To address this challenge, we introduce ScFold, a novel model that incorporates an innovative node module. This module utilizes spatial dimensionality reduction and positional encoding mechanisms to enhance the extraction of structural features. Experimental results indicate that ScFold achieves a recovery rate of 52.22$\%$ on the CATH4.2 dataset, demonstrating notable efficacy for short-chain proteins, with a recovery rate of 41.6$\%$. Additionally, ScFold further exhibits enhanced recovery rates of 59.32$\%$ and 61.59$\%$ on the TS50 and TS500 datasets, respectively, demonstrating its effectiveness across diverse protein types. Additionally, we performed protein length stratification on the TS500 and CATH4.2 datasets and tested ScFold on length-specific sub-datasets. The results confirm the model's superiority in handling short-chain proteins. Finally, we selected several protein sequence groups from the CATH4.2 dataset for structural visualization analysis and provided comparisons between the model-generated sequences and the target sequences.
Keywords: attention mechanism; inverse folding; protein design; short chain; spatial reduction.
© The Author(s) 2025. Published by Oxford University Press.
Conflict of interest statement
No competing interest is declared.
Figures




Similar articles
-
Sequence-Similar Protein Domain Pairs With Structural or Topological Dissimilarity.Proteins. 2025 Mar;93(3):588-597. doi: 10.1002/prot.26753. Epub 2024 Oct 11. Proteins. 2025. PMID: 39392124 Free PMC article.
-
Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.BMC Bioinformatics. 2009 Dec 13;10:414. doi: 10.1186/1471-2105-10-414. BMC Bioinformatics. 2009. PMID: 20003388 Free PMC article.
-
GTalign: spatial index-driven protein structure alignment, superposition, and search.Nat Commun. 2024 Aug 24;15(1):7305. doi: 10.1038/s41467-024-51669-z. Nat Commun. 2024. PMID: 39181863 Free PMC article.
-
The construction of an amino acid network for understanding protein structure and function.Amino Acids. 2014 Jun;46(6):1419-39. doi: 10.1007/s00726-014-1710-6. Epub 2014 Mar 13. Amino Acids. 2014. PMID: 24623120 Review.
-
Towards more accurate prediction of protein folding rates: a review of the existing Web-based bioinformatics approaches.Brief Bioinform. 2015 Mar;16(2):314-24. doi: 10.1093/bib/bbu007. Epub 2014 Mar 11. Brief Bioinform. 2015. PMID: 24621527 Review.
References
-
- Liu Z, Li S, Di W. et al. . Automix: Unveiling the power of mixup for stronger classifiers. In: European Conference on Computer Vision, pp. 441–58. Germany: Springer, 202210.1007/978-3-031-20053-3_26. - DOI
-
- Cheng T, Gao Z, Wu L. et al. . Temporal attention unit: Towards efficient spatiotemporal predictive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18770–82. New York, NY, USA: IEEE, 2023.
-
- McPartlon M, Lai B, Jinbo X. A deep se (3)-equivariant model for learning inverse protein folding. BioRxiv 2022;2022–04. 10.1101/2022.04.15.488492 - DOI
MeSH terms
Substances
LinkOut - more resources
Full Text Sources