Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 10:2:211.
doi: 10.12688/f1000research.2-211.v3. eCollection 2013.

Protein structure quality assessment based on the distance profiles of consecutive backbone Cα atoms

Affiliations

Protein structure quality assessment based on the distance profiles of consecutive backbone Cα atoms

Sandeep Chakraborty et al. F1000Res. .

Abstract

Predicting the three dimensional native state structure of a protein from its primary sequence is an unsolved grand challenge in molecular biology. Two main computational approaches have evolved to obtain the structure from the protein sequence - ab initio/de novo methods and template-based modeling - both of which typically generate multiple possible native state structures. Model quality assessment programs (MQAP) validate these predicted structures in order to identify the correct native state structure. Here, we propose a MQAP for assessing the quality of protein structures based on the distances of consecutive Cα atoms. We hypothesize that the root-mean-square deviation of the distance of consecutive Cα (RDCC) atoms from the ideal value of 3.8 Å, derived from a statistical analysis of high quality protein structures (top100H database), is minimized in native structures. Based on tests with the top100H set, we propose a RDCC cutoff value of 0.012 Å, above which a structure can be filtered out as a non-native structure. We applied the RDCC discriminator on decoy sets from the Decoys 'R' Us database to show that the native structures in all decoy sets tested have RDCC below the 0.012 Å cutoff. While most decoy sets were either indistinguishable using this discriminator or had very few violations, all the decoy structures in the fisa decoy set were discriminated by applying the RDCC criterion. This highlights the physical non-viability of the fisa decoy set, and possible issues in benchmarking other methods using this set. The source code and manual is made available at https://github.com/sanchak/mqap and permanently available on 10.5281/zenodo.7134.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Root-mean-square deviation of the distance of consecutive C α (RDCC) atoms from the ideal value of 3.8 Å.
( a) Probability distribution (P(x)) for the distance of consecutive C α in ~100 proteins in the top100H database. ( b) RDCC in ~100 high quality structures from the top100H database. ( c) Variation in specificity based on the cutoff value. We choose 0.012 Å as the cutoff for filtering out non-native structures. ( d) RDCC in I-TASSER CASP8 decoy suite. ( e) RDCC for protein structures based on the resolution.
Figure 2.
Figure 2.. Root-mean-square deviation (RMSD) of the distance of consecutive C α (RDCC) atoms from the ideal value of 3.8 Å in decoy sets.
The hg_structal and misfold decoy sets are indistinguishable using the distance discriminator, unlike the fisa decoy set. We have shown ~25 decoy structures from the fisa set, but the values apply to all the decoys (more than 500). The first protein (the native structure) in each set has RDCC below the 0.012 Å cutoff.
Figure 3.
Figure 3.. Superimposition of the native structure and a decoy structure (AXPROA00-MIN) for a protein (PDBid:1FC2) taken from the fisa decoy set.
The native structure is in red, and the decoy structure is in green. The structures are superimposed using MUSTANG . The distance between Ile12/C α and Leu13/C α atoms is 3.8 Å and 4.1 Å in the native and the decoy structures, respectively.

References

    1. Wise EL, Rayment I: Understanding the importance of protein structure to nature's routes for divergent evolution in TIM barrel enzymes. Acc Chem Res. 2004;37(3):149–158. 10.1021/ar030250v - DOI - PubMed
    1. Soding J: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21(7):951–960. 10.1093/bioinformatics/bti125 - DOI - PubMed
    1. Peng J, Xu J: RaptorX: exploiting structure information for protein alignment by statistical inference. Proteins. 2011;79(Suppl 10):161–171. 10.1002/prot.23175 - DOI - PMC - PubMed
    1. Zhang Y: Template-based modeling and free modeling by I-TASSER in CASP7. Proteins. 2007;69(Suppl 8):108–117. 10.1002/prot.21702 - DOI - PubMed
    1. Wu S, Skolnick J, Zhang Y: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol. 2007;5:17. 10.1186/1741-7007-5-17 - DOI - PMC - PubMed

LinkOut - more resources