Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug 1;25(15):8426.
doi: 10.3390/ijms25158426.

AI-Driven Deep Learning Techniques in Protein Structure Prediction

Affiliations
Review

AI-Driven Deep Learning Techniques in Protein Structure Prediction

Lingtao Chen et al. Int J Mol Sci. .

Abstract

Protein structure prediction is important for understanding their function and behavior. This review study presents a comprehensive review of the computational models used in predicting protein structure. It covers the progression from established protein modeling to state-of-the-art artificial intelligence (AI) frameworks. The paper will start with a brief introduction to protein structures, protein modeling, and AI. The section on established protein modeling will discuss homology modeling, ab initio modeling, and threading. The next section is deep learning-based models. It introduces some state-of-the-art AI models, such as AlphaFold (AlphaFold, AlphaFold2, AlphaFold3), RoseTTAFold, ProteinBERT, etc. This section also discusses how AI techniques have been integrated into established frameworks like Swiss-Model, Rosetta, and I-TASSER. The model performance is compared using the rankings of CASP14 (Critical Assessment of Structure Prediction) and CASP15. CASP16 is ongoing, and its results are not included in this review. Continuous Automated Model EvaluatiOn (CAMEO) complements the biennial CASP experiment. Template modeling score (TM-score), global distance test total score (GDT_TS), and Local Distance Difference Test (lDDT) score are discussed too. This paper then acknowledges the ongoing difficulties in predicting protein structure and emphasizes the necessity of additional searches like dynamic protein behavior, conformational changes, and protein-protein interactions. In the application section, this paper introduces some applications in various fields like drug design, industry, education, and novel protein development. In summary, this paper provides a comprehensive overview of the latest advancements in established protein modeling and deep learning-based models for protein structure predictions. It emphasizes the significant advancements achieved by AI and identifies potential areas for further investigation.

Keywords: AlphaFold; artificial intelligence; bioinformatics; computational methods; deep learning; healthcare; machine learning; protein modeling; protein structure; transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The flowchart of this review paper. It shows the overall flow of this paper, including the sequence of sections and their interconnections.
Figure 2
Figure 2
Sample FASTA file for protein (PDB ID 7SF8 [78]) with 4 chains.
Figure 3
Figure 3
Four levels of protein structures. (A) The primary structure is shown as 3-letter codes, unlike Figure 2. The sequence is randomly written as a demonstration. (BD) The secondary structure shows alpha helices as an example. Secondary, tertiary, and quaternary structures are visualized in PyMOL [80], a visualization tool for molecules, and macromolecules like proteins. The PDB ID used is 7SF8 [78].
Figure 4
Figure 4
Comparison of experimental structure of protein (PDB ID 7SF8 [78]) and predicted structure by AlphaFold2. (A) PDB file of 7SF8 [78] shown in PyMOL [80]. (B) Predicted structure by AlphaFold2 with confidence scores using protein sequence (PDB ID 7SF8 [78]). A higher score means the model is more confident in the correctness of the predictions. (C) Figures (A) and (B) are shown together in PyMOL [80]. Purple is the AlphaFold2 prediction. (DF) The zoomed-in area where AlphaFold2 has low confidence scores. There are some significant differences in these areas.

Similar articles

Cited by

References

    1. Schulz G.E., Schirmer R.H. Principles of Protein Structure. Springer Science & Business Media; Berlin/Heidelberg, Germany: 2013.
    1. Petsko G.A., Ringe D. Protein Structure and Function. New Science Press; London, UK: 2004.
    1. Law J. The development of specialties in science: The case of X-ray protein crystallography. Sci. Stud. 1973;3:275–303. doi: 10.1177/030631277300300303. - DOI
    1. Smyth M., Martin J. x Ray crystallography. Mol. Pathol. 2000;53:8. doi: 10.1136/mp.53.1.8. - DOI - PMC - PubMed
    1. Hu Y., Cheng K., He L., Zhang X., Jiang B., Jiang L., Li C., Wang G., Yang Y., Liu M. NMR-based methods for protein analysis. Anal. Chem. 2021;93:1866–1879. doi: 10.1021/acs.analchem.0c03830. - DOI - PubMed

LinkOut - more resources