Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 8.
doi: 10.1007/s12539-025-00732-4. Online ahead of print.

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation

Affiliations

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation

Wenrui Gou et al. Interdiscip Sci. .

Abstract

Protein structures are fundamental to understanding their functions and interactions. With the continuous advancement of protein structure prediction methods, structure databases are rapidly expanding. Identifying the origin of protein structures is crucial for assessing the reliability of experimental resolution and computational prediction methods, as well as for guiding downstream biological research. Existing protein representation approaches often fail to capture subtle yet critical structural differences, posing challenges for precise structural traceability. To address this, we propose a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro), for the representation and origin evaluation of protein structures. CPE-Pro integrates a pre-trained protein Structural Sequence Language Model (SSLM) and Geometric Vector Perceptron-Graph Neural Network (GVP-GNN) to learn structure-aware protein representations and capture structural differences, enabling accurate classification across four origins of structural data. Preliminary results indicate that, compared to large-scale protein language models trained on extensive amino acid sequences, structural sequences enriched with local structural features enable the model to capture more informative protein characteristics, thereby enhancing and refining protein representations. Future research directions include extending the architecture to additional protein structure paradigms and developing evaluation methodologies for low-pLDDT predicted structures, providing more effective tools for protein structure analysis. The code, model weights, and all relevant materials are available at https://github.com/wr1102/CPE-Pro .

Keywords: Deep learning; Origin evaluation; Protein representation; Structural sequence.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Similar articles

References

    1. Reis R, Moraes I (2018) Structural biology and structure-function relationships of membrane proteins. Biochem Soc Trans 47(1):47–61. https://doi.org/10.1042/BST20180269 - DOI - PubMed
    1. Watson JL, Juergens D, Bennett NR et al (2023) De novo design of protein structure and function with RF diffusion. Nature 620(7976):1089–1100. https://doi.org/10.1038/s41586-023-06415-8 - DOI - PubMed - PMC
    1. Gold ND, Jackson RM (2006) Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships. J Mol Biol 355(5):1112–1124. https://doi.org/10.1016/j.jmb.2005.11.044 - DOI - PubMed
    1. Wu MH, Xie Z, Zhi D (2025) A folding-docking-affinity framework for protein-ligand binding affinity prediction. Commun Chem 8(1):1–9. https://doi.org/10.1038/s42004-025-01506-1 - DOI
    1. Zhang H, Gong W, Wu S et al (2021) Studying protein folding in health and disease using biophysical approaches. Emerg Top Life Sci 5(1):29–38. https://doi.org/10.1042/ETLS20200317 - DOI - PubMed - PMC

LinkOut - more resources