Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 4;26(2):bbaf156.
doi: 10.1093/bib/bbaf156.

ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction

Affiliations

ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction

Jiancheng Zhong et al. Brief Bioinform. .

Abstract

In the realm of protein design, the efficient construction of protein sequences that accurately fold into predefined structures has become an important area of research. Although advancements have been made in the study of long-chain proteins, the design of short-chain proteins requires equal consideration. The structural information inherent in short and single chains is typically less comprehensive than that of full-length chains, which can negatively impact their performance. To address this challenge, we introduce ScFold, a novel model that incorporates an innovative node module. This module utilizes spatial dimensionality reduction and positional encoding mechanisms to enhance the extraction of structural features. Experimental results indicate that ScFold achieves a recovery rate of 52.22$\%$ on the CATH4.2 dataset, demonstrating notable efficacy for short-chain proteins, with a recovery rate of 41.6$\%$. Additionally, ScFold further exhibits enhanced recovery rates of 59.32$\%$ and 61.59$\%$ on the TS50 and TS500 datasets, respectively, demonstrating its effectiveness across diverse protein types. Additionally, we performed protein length stratification on the TS500 and CATH4.2 datasets and tested ScFold on length-specific sub-datasets. The results confirm the model's superiority in handling short-chain proteins. Finally, we selected several protein sequence groups from the CATH4.2 dataset for structural visualization analysis and provided comparisons between the model-generated sequences and the target sequences.

Keywords: attention mechanism; inverse folding; protein design; short chain; spatial reduction.

PubMed Disclaimer

Conflict of interest statement

No competing interest is declared.

Figures

Figure 1
Figure 1
The figure illustrates the overall framework of ScFold.
Figure 2
Figure 2
We report the recovery rates and perplexity of our method and Pifold on the four subsets of TS500 with different sequence lengths.
Figure 3
Figure 3
We divided the CATH4.2 dataset into four subsets based on protein sequence length, and we report the recovery rates and perplexity of our method and PiFold on these four subsets.
Figure 4
Figure 4
A case study was conducted and the sequences predicted by ScFold and PiFold were compared with the target sequence.

Similar articles

References

    1. Liu Z, Li S, Di W. et al. . Automix: Unveiling the power of mixup for stronger classifiers. In: European Conference on Computer Vision, pp. 441–58. Germany: Springer, 202210.1007/978-3-031-20053-3_26. - DOI
    1. Cheng T, Gao Z, Wu L. et al. . Temporal attention unit: Towards efficient spatiotemporal predictive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18770–82. New York, NY, USA: IEEE, 2023.
    1. Yufeng Liu L, Zhang WW, Zhu M. et al. . Rotamer-free protein sequence design based on deep learning and self-consistency. Nat Comput Sci 2022;2:451–62. 10.1038/s43588-022-00273-6 - DOI - PubMed
    1. McPartlon M, Lai B, Jinbo X. A deep se (3)-equivariant model for learning inverse protein folding. BioRxiv 2022;2022–04. 10.1101/2022.04.15.488492 - DOI
    1. Huang B, Fan T, Wang K. et al. . Accurate and efficient protein sequence design through learning concise local environment of residues. Bioinformatics 2023;39:btad122. 10.1093/bioinformatics/btad122 - DOI - PMC - PubMed