Review

. 2023 Oct;21(5):913-925.

doi: 10.1016/j.gpb.2022.11.014. Epub 2023 Mar 30.

Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms

Bin Huang¹, Lupeng Kong², Chao Wang³, Fusong Ju⁴, Qi Zhang⁵, Jianwei Zhu⁴, Tiansu Gong¹, Haicang Zhang⁶, Chungong Yu⁷, Wei-Mou Zheng⁸, Dongbo Bu⁹

Affiliations

¹ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China.
² Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Changping Laboratory, Beijing 102206, China.
³ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
⁴ Microsoft Research AI4Science, Beijing 100080, China.
⁵ Huawei Noah's Ark Lab, Wuhan 430206, China.
⁶ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China. Electronic address: zhanghaicang@ict.ac.cn.
⁷ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China. Electronic address: yuchungong@ict.ac.cn.
⁸ Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China. Electronic address: zheng@mail.itp.ac.cn.
⁹ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China. Electronic address: dbu@ict.ac.cn.

PMID: 37001856
PMCID: PMC10928435
DOI: 10.1016/j.gpb.2022.11.014

Review

Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms

Bin Huang et al. Genomics Proteomics Bioinformatics. 2023 Oct.

. 2023 Oct;21(5):913-925.

doi: 10.1016/j.gpb.2022.11.014. Epub 2023 Mar 30.

Authors

Bin Huang¹, Lupeng Kong², Chao Wang³, Fusong Ju⁴, Qi Zhang⁵, Jianwei Zhu⁴, Tiansu Gong¹, Haicang Zhang⁶, Chungong Yu⁷, Wei-Mou Zheng⁸, Dongbo Bu⁹

Affiliations

¹ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China.
² Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Changping Laboratory, Beijing 102206, China.
³ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
⁴ Microsoft Research AI4Science, Beijing 100080, China.
⁵ Huawei Noah's Ark Lab, Wuhan 430206, China.
⁶ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China. Electronic address: zhanghaicang@ict.ac.cn.
⁷ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China. Electronic address: yuchungong@ict.ac.cn.
⁸ Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China. Electronic address: zheng@mail.itp.ac.cn.
⁹ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China. Electronic address: dbu@ict.ac.cn.

PMID: 37001856
PMCID: PMC10928435
DOI: 10.1016/j.gpb.2022.11.014

Abstract

Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.

Keywords: Deep learning; Language model; Protein folding; Protein structure prediction; Transformer.

PubMed Disclaimer

Conflict of interest statement

Fusong Ju and Jianwei Zhu are the current employees of Microsoft Corp. Qi Zhang is the current employee of Huawei Technologies Co., Ltd. All the other authors have declared no competing interests.

Figures

**Figure 1**
**Protein sequence, protein structure, and protein structure prediction** A. An example of protein sequence and its tertiary structure. Here, we show a C-terminal fragment of the ribosomal protein L7/L12 from *Escherichia coli* (PDB: 1CTF), which consists of a total of 74 residues linked via peptide bonds. The tertiary structure specifies the unique 3D coordinates of each atom in the relative position of the whole protein. Cartoon backbone representation is widely used to visualize protein tertiary structure. B. Homology modeling method for protein structure prediction. C. Threading method for protein structure prediction. D.Ab initio prediction approach. PDB, Protein Data Bank; 3D, 3-dimensional.

**Figure 2**
**Chronological diagram of the representative approaches to protein structure prediction** Here, homology modeling approaches are shown in red, template-based approaches are shown in green, *ab initio* approaches are shown in blue, and other techniques are shown in black.

**Figure 3**
**Performance of representative approaches to protein structure prediction** A. Performance of the prediction approaches in previous CASPs. Trendlines indicate the agreement of the target protein backbone for the best-predicted structures with that of the native structures in the last 14 CASP rounds; open circles indicate the individual data points for CASP14. Target difficulty is based on sequence and structural similarity to existing experimental protein structures, which was adapted from with permission. B. Prediction performance of AlphaFold2 for 20,296 human proteins covering 10,537,122 residues. For each protein, AlphaFold2 outputs a pLDDT score as an estimation of the prediction quality. For nearly 36% of proteins, AlphaFold2 predicts their structures with high confidence (pLDDT ≥ 90). The data were taken from . C. The performance of the prediction approaches using MSAs or a single sequence as input. On 29 selected CASP-free modeling targets, AlphaFold2 and RoseTTAFold show excellent accuracy when using MSAs of query proteins as input. However, their performances decrease sharply when using a single sequence of query protein as their only input. In contrast, OmegaFold and ProFOLD Single, the approaches specially designed for single-sequence prediction, achieve high accuracies that approximate the approaches using MSAs. It should be noted that the accuracy of ProFOLD Single is acquired from CASP14 target proteins to avoid overlapping between training and test data, which was adapted from with modifications. CASP, Critical Assessment of Structure Prediction; GDT_TS, Global Distance Test-Total Score; MSA, multiple sequence alignment; pLDDT, predicted Local Distance Difference Test.

**Figure 4**
**Strong structural signals in protein Se0862 (PDB: 6UF2)** Three types of regions that might carry strong structural signals, including single helical turn (blue), β-turn (red), and a pair of secondary structural elements with contact between them (purple).

**Figure 5**
**An example of**aninter-residue contactin GFP (PDB: 4EUL) and co-mutations observed in its homologs Two residues in contact 55V–106Y (shown in red) co-mutate to 55I–106F (in green) to maintain the contact between them; and thus, in turn, the co-mutations observed in homologous proteins can be exploited to infer inter-residue contacts. To demonstrate this, we use ProDESIGN-LE2, a protein sequence design method, to design four sequences (P1–P4) for the structure of GFP. As the design process of ProDESIGN-LE2 resembles the evolution of the target protein, the resulting designed sequences could be used as an approximation of the homologies of target proteins. ProDESIGNE-LE2 is an improved version of ProDESIGN-LE . GFP, green fluorescent protein.

**Figure 6**
**Predicted structures for CASP14 targetsT1049-D1, T1031-D1, and T1067-D1 by AlphaFold2, BAKER, Zhang-Server, and RaptorX** For each representative target (in rows) in a target group defined in CASP14 and each predicting method (in columns), the alignment between the predicted structure (red) and the native structure (blue) is shown. Targets are mainly classified into TBM and FM categories using their prediction quality and template detectability. TBM, template-based modeling; FM, free modeling; TM-score, template modeling score.

See this image and copyright information in PMC

Cited by

In silico approaches to study the human asparagine synthetase: An insight of the interaction between the enzyme active sites and its substrates.
Riaz A, Kaleem A, Abdullah R, Iqtedar M, Hoessli DC, Aftab M. Riaz A, et al. PLoS One. 2024 Aug 2;19(8):e0307448. doi: 10.1371/journal.pone.0307448. eCollection 2024. PLoS One. 2024. PMID: 39093903 Free PMC article.
In silico approaches supporting drug repurposing for Leishmaniasis: a scoping review.
Scheiffer G, Domingues KZA, Gorski D, Cobre AF, Lazo REL, Borba HHL, Ferreira LM, Pontarolo R. Scheiffer G, et al. EXCLI J. 2024 Sep 3;23:1117-1169. doi: 10.17179/excli2024-7552. eCollection 2024. EXCLI J. 2024. PMID: 39421030 Free PMC article.
Characterization of soil-derived Bacillus subtilis metabolites against breast cancer: In vitro and in silico studies.
Hashmi AI, Iqtedar M, Saeed H, Ahmed N, Abdullah R, Kaleem A, Abbasi MA. Hashmi AI, et al. Saudi Pharm J. 2025 Apr 17;33(1-2):3. doi: 10.1007/s44446-025-00006-6. Saudi Pharm J. 2025. PMID: 40397331 Free PMC article.
A comprehensive review of artificial intelligence for pharmacology research.
Li B, Tan K, Lao AR, Wang H, Zheng H, Zhang L. Li B, et al. Front Genet. 2024 Sep 3;15:1450529. doi: 10.3389/fgene.2024.1450529. eCollection 2024. Front Genet. 2024. PMID: 39290983 Free PMC article. Review.
An overview on olfaction in the biological, analytical, computational, and machine learning fields.
Chiera F, Costa G, Alcaro S, Artese A. Chiera F, et al. Arch Pharm (Weinheim). 2025 Jan;358(1):e2400414. doi: 10.1002/ardp.202400414. Epub 2024 Oct 22. Arch Pharm (Weinheim). 2025. PMID: 39439128 Free PMC article. Review.

See all "Cited by" articles

References

1. Branden C., Tooze J. 2nd ed. Garland Science; New York: 1998. Introduction to protein structure.
1. Finkelstein A.V., Ptitsyn O.B. 2nd ed. Elsevier; Amsterdam: 2016. Protein physics: a course of lectures.
1. Kaur H., Garg A., Raghava G.P.S. PEPstr: a de novo method for tertiary structure prediction of small bioactive peptides. Protein Pept Lett. 2007;14:626–631. - PubMed
1. Yang Y., Gao J., Wang J., Heffernan R., Hanson J., Paliwal K., et al. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform. 2018;19:482–494. - PMC - PubMed
1. Dill K.A., MacCallum J.L. The protein-folding problem, 50 years on. Science. 2012;338:1042–1046. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms

Affiliations

Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources