Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 1;39(1):btad030.
doi: 10.1093/bioinformatics/btad030.

3D-equivariant graph neural networks for protein model quality assessment

Affiliations

3D-equivariant graph neural networks for protein model quality assessment

Chen Chen et al. Bioinformatics. .

Abstract

Motivation: Quality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods.

Results: We develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method-AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method-AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA.

Availability and implementation: The source code is available at https://github.com/BioinfoMachineLearning/EnQA.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The illustration of the local spherical coordinate system. Different colors indicate atoms from different residues. Here, θ, φ and r are spherical angles and the radial distance for the vector between the alpha carbons (Ca) of two residues (blue and red)
Fig. 2.
Fig. 2.
The illustration of the overall architecture of EnQA. The 1D/2D features from the input model are first converted into hidden node and edge features for the 3D-equivarant graph module. The spatial coordinates of Ca atoms of the residues are also used as an extra feature. The node and edge network modules update the graph features iteratively. In the end, the final per-residue lDDT score and distance errors of residue pairs are predicted from the updated node/edge features and spatial coordinates by the 3D-equivariant network
Fig. 3.
Fig. 3.
The distribution of lDDT scores of AlphaFold test models. The x-axis denotes the targets ordered by the mean lDDT of their models in increasing order. The red dots indicate the position of the median and the bars indicate the upper and lower ranges of model quality of each target
Fig. 4.
Fig. 4.
The comparison between the predicted and true lDDT scores for AlphaFold2_test models for the two methods (AF2 reported score and EnQA-MSA). The residue-level correlation is computed for all residues at once, which is different from the average of the residue-level correlation in each model (used in Sections 3.1 and 3.2). r, Pearson correlation coefficient; ρ, Spearman correlation coefficient. The lDDT scores predicted by EnQA-MSA have higher correlation with the true lDDT scores than AlphaFold2 self-reported scores
Fig. 5.
Fig. 5.
The distribution of estimation error between the predicted and true lDDT scores on AlphaFold2_test dataset. The difference between AF2_plddt scores and true pLDDT scores (green) is significant (P <0.01), but the difference between pLDDT scores predicted by EnQA-MSA and true pLDDT scores (red) is not significant (P = 0.117)
Fig. 6.
Fig. 6.
The comparison of residue-level Pearson’s correlation coefficient when different features are randomly permuted for model QA. The red dots indicate the position of the median

References

    1. Andreeva A. et al. (2014) SCOP2 prototype: A new approach to protein structure mining. Nucleic Acids Res., 42, D310–D314. - PMC - PubMed
    1. Andreeva A. et al. (2020) The SCOP database in 2020: Expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res., 48, D376–D382. - PMC - PubMed
    1. Arnold K. et al. (2006) The SWISS-MODEL workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22, 195–201. - PubMed
    1. Baek M. et al. (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373, 871–876. - PMC - PubMed
    1. Baldassarre F. et al. (2021) GraphQA: Protein model quality assessment using graph convolutional networks. Bioinformatics, 37, 360–366. - PMC - PubMed

Publication types