Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Apr 3:16:1498662.
doi: 10.3389/fphar.2025.1498662. eCollection 2025.

Protein structure prediction via deep learning: an in-depth review

Affiliations
Review

Protein structure prediction via deep learning: an in-depth review

Yajie Meng et al. Front Pharmacol. .

Abstract

The application of deep learning algorithms in protein structure prediction has greatly influenced drug discovery and development. Accurate protein structures are crucial for understanding biological processes and designing effective therapeutics. Traditionally, experimental methods like X-ray crystallography, nuclear magnetic resonance, and cryo-electron microscopy have been the gold standard for determining protein structures. However, these approaches are often costly, inefficient, and time-consuming. At the same time, the number of known protein sequences far exceeds the number of experimentally determined structures, creating a gap that necessitates the use of computational approaches. Deep learning has emerged as a promising solution to address this challenge over the past decade. This review provides a comprehensive guide to applying deep learning methodologies and tools in protein structure prediction. We initially outline the databases related to the protein structure prediction, then delve into the recently developed large language models as well as state-of-the-art deep learning-based methods. The review concludes with a perspective on the future of predicting protein structure, highlighting potential challenges and opportunities.

Keywords: deep learning; evaluation index; large language model; protein structure databases; protein structure prediction.

PubMed Disclaimer

Conflict of interest statement

JY is employed by the company Geneis Beijing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Four levels of protein structure.
FIGURE 2
FIGURE 2
We categorize protein structure prediction approaches into three types: template-based modeling (TBM), template-free modeling (TFM), and ab initio. Detailed steps of these three approaches are provided in the Introduction section.
FIGURE 3
FIGURE 3
Architecture of deep learning models. (a) DNN takes the protein sequence as input and outputs the protein structure after processing through several hidden layers. (b) The CNN takes the protein structure as input for pre-processing, then, within the convolution layer, features are extracted by convolutional operations to reduce noise and pool data features remain unchanged while the data is compressed to reduce overfitting. After several rounds of convolution and pooling operations, the data is compressed. At the same time, the data is abstracted into features with higher information content, and finally, through the fully connected layer, the results are obtained. (c) RNN takes protein sequence data as input and increases the number of layers of the network for vertical expansion, using chaining and recursion to finally obtain prediction results. (d) LSTM can solve the long-term dependency problem found in general RNN, as well as issues such as long-term memory and gradients in back propagation. (e) GRU, a variation of LSTM, runs more efficiently than LSTM networks. GRU can achieve comparable results and can improve training efficiency to a great extent. (f) The amino acid sequence of the protein is used as input in GNN to abstract the protein structure as a graph structure. The features of nodes and edges are extracted by edge embedding and node embedding to obtain edge translation path and node translation path. In node translation path, each amino acid is considered as a node within a graph, with the node’s feature vector typically encompassing the physicochemical properties of the amino acid. The translation of edges focuses on the interactions between amino acids in the protein sequence. In GNNs, edges represent the relationships between nodes (amino acids), and by updating the weights of these edges, it’s possible to capture these interactions, thereby reflecting the three-dimensional structural characteristics of proteins in the graph. The geometry of the 3D protein backbone structure is then predicted after the distance geometric graph representation and the dihedral geometric graph representation, respectively. (g) Deep residual neural network takes protein templates and query sequences as input and predicts the protein 3D structure by the input feature tensor. (h) Large language models train a processed data, often using techniques like transfer learning from pre-trained models.
FIGURE 4
FIGURE 4
Diagram illustrating the future research hotspots and application scope of deep learning-based protein structure prediction. Protein structure prediction is the basis for disease diagnosis, drug repositioning, and vaccine development research. Future research can predict the 3D structure of proteins, including obtaining remote homologous sequences, interpretability of protein structure predictions, and protein domain boundary prediction, by DNN, CNN, GNN, RNN, LSTM, ResNet, and LLM deep learning algorithms.

Similar articles

Cited by

References

    1. Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., et al. (2024). Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 630, 493–500. 10.1038/s41586-024-07487-w - DOI - PMC - PubMed
    1. Agarwal V., McShan A. C. (2024). The power and pitfalls of alphafold2 for structure prediction beyond rigid globular proteins. Nat. Chem. Biol. 20, 950–959. 10.1038/s41589-024-01638-w - DOI - PMC - PubMed
    1. Akbar S., Pardasani K. R., Panda N. R. (2021). Pso based neuro-fuzzy model for secondary structure prediction of protein. Neural Process. Lett. 53, 4593–4612. 10.1007/s11063-021-10615-6 - DOI
    1. AlQuraishi M. (2019). Proteinnet: a standardized data set for machine learning of protein structure. BMC Bioinforma. 20, 1–10. 10.1186/s12859-019-2932-0 - DOI - PMC - PubMed
    1. Alsayadi H. A., Abdelhamid A. A., Hegazy I., Fayed Z. T. (2021). Arabic speech recognition using end-to-end deep learning. IET Signal Process. 15, 521–534. 10.1049/sil2.12057 - DOI

LinkOut - more resources