ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction

Jiancheng Zhong¹, Zhiwei Zou¹, Jie Qiu¹, Shaokai Wang²

Affiliations

¹ College of Information Science and Engineering, Hunan Normal University, 36 Lushan Road, Yuelu District, Changsha 410081, Hunan, China.
² Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China.

PMID: 40205854
PMCID: PMC11982017
DOI: 10.1093/bib/bbaf156

ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction

Jiancheng Zhong et al. Brief Bioinform. 2025.

. 2025 Mar 4;26(2):bbaf156.

doi: 10.1093/bib/bbaf156.

Authors

Jiancheng Zhong¹, Zhiwei Zou¹, Jie Qiu¹, Shaokai Wang²

Affiliations

¹ College of Information Science and Engineering, Hunan Normal University, 36 Lushan Road, Yuelu District, Changsha 410081, Hunan, China.
² Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China.

PMID: 40205854
PMCID: PMC11982017
DOI: 10.1093/bib/bbaf156

Abstract

In the realm of protein design, the efficient construction of protein sequences that accurately fold into predefined structures has become an important area of research. Although advancements have been made in the study of long-chain proteins, the design of short-chain proteins requires equal consideration. The structural information inherent in short and single chains is typically less comprehensive than that of full-length chains, which can negatively impact their performance. To address this challenge, we introduce ScFold, a novel model that incorporates an innovative node module. This module utilizes spatial dimensionality reduction and positional encoding mechanisms to enhance the extraction of structural features. Experimental results indicate that ScFold achieves a recovery rate of 52.22$\%$ on the CATH4.2 dataset, demonstrating notable efficacy for short-chain proteins, with a recovery rate of 41.6$\%$. Additionally, ScFold further exhibits enhanced recovery rates of 59.32$\%$ and 61.59$\%$ on the TS50 and TS500 datasets, respectively, demonstrating its effectiveness across diverse protein types. Additionally, we performed protein length stratification on the TS500 and CATH4.2 datasets and tested ScFold on length-specific sub-datasets. The results confirm the model's superiority in handling short-chain proteins. Finally, we selected several protein sequence groups from the CATH4.2 dataset for structural visualization analysis and provided comparisons between the model-generated sequences and the target sequences.

Keywords: attention mechanism; inverse folding; protein design; short chain; spatial reduction.

PubMed Disclaimer

Conflict of interest statement

No competing interest is declared.

Figures

**Figure 1**
The figure illustrates the overall framework of ScFold.

**Figure 2**
We report the recovery rates and perplexity of our method and Pifold on the four subsets of TS500 with different sequence lengths.

**Figure 3**
We divided the CATH4.2 dataset into four subsets based on protein sequence length, and we report the recovery rates and perplexity of our method and PiFold on these four subsets.

**Figure 4**
A case study was conducted and the sequences predicted by ScFold and PiFold were compared with the target sequence.

See this image and copyright information in PMC

References

1. Liu Z, Li S, Di W. et al. . Automix: Unveiling the power of mixup for stronger classifiers. In: European Conference on Computer Vision, pp. 441–58. Germany: Springer, 202210.1007/978-3-031-20053-3_26. - DOI
1. Cheng T, Gao Z, Wu L. et al. . Temporal attention unit: Towards efficient spatiotemporal predictive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18770–82. New York, NY, USA: IEEE, 2023.
1. Yufeng Liu L, Zhang WW, Zhu M. et al. . Rotamer-free protein sequence design based on deep learning and self-consistency. Nat Comput Sci 2022;2:451–62. 10.1038/s43588-022-00273-6 - DOI - PubMed
1. McPartlon M, Lai B, Jinbo X. A deep se (3)-equivariant model for learning inverse protein folding. BioRxiv 2022;2022–04. 10.1101/2022.04.15.488492 - DOI
1. Huang B, Fan T, Wang K. et al. . Accurate and efficient protein sequence design through learning concise local environment of residues. Bioinformatics 2023;39:btad122. 10.1093/bioinformatics/btad122 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction

Affiliations

ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources