Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017:2017:5760612.
doi: 10.1155/2017/5760612. Epub 2017 Oct 8.

All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds

Affiliations

All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds

Majid Masso. Biomed Res Int. 2017.

Erratum in

Abstract

Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-'R'-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment.

PubMed Disclaimer

Figures

Figure 1
Figure 1
HIV-1 protease (a) ribbon and (b) atomic ball-and-stick diagrams. The atomic coordinates are used as tetrahedral vertices to generate (c) the Delaunay tessellation of the protein chain, a convex hull consisting of thousands of space-filling and nonoverlapping tetrahedra, each of whose vertices objectively identifies a quadruplet of nearest neighbor atoms. The modified tessellation in (d) is obtained by removing all edges longer than 12 Å between pairs of atoms, thereby eliminating all tetrahedra that share those edges and excluding their corresponding atomic quadruplets from consideration as nearest neighbors.
Figure 2
Figure 2
Graphical representations for two four-body potentials, based on an eight-letter alphabet with a 12 Å edge-length cutoff, and a twenty-letter alphabet with a 4.8 Å edge-length cutoff. Here Cα = alpha-carbon, CB = backbone carbonyl-carbon, and S = side-chain sulfur (from either cysteine or methionine) represent the same atom types in both alphabets, with quadruplets SSSS and CαCBCBCB appearing at the same extremes of both potentials. Despite millions of tetrahedra generated by the 1417 protein tessellations irrespective of the cutoff length (see Table 1), note that 3 of 330 atomic quadruplet types (CαCαCαS, CBCBCBCB, and CαCBCBNS) did not appear at all as tetrahedral vertices based on an 8-letter atomic alphabet with a 12 Å cutoff (NS = side-chain nitrogen atom), while 1935 of 8855 quadruplets types were not observed under a 20-letter alphabet with a 4.8 Å cutoff.
Figure 3
Figure 3
Sampling of calculated energy versus rmsd plots for four decoy sets. A different atomic four-body statistical potential energy function (i.e., distinct pairs of atomic alphabet size and tessellation edge-length cutoff parameters) was selected to compute the energy values for each plot. The plots reveal wide variability in the number of alternative conformations for a given native structure based on decoy category, and they highlight the relative strengths and weaknesses of native rank, correlation coefficient (r), z-score, and fractional enrichment (FE) as performance measures under a range of conditions, hence reinforcing their collective importance for evaluating energy functions.
Figure 4
Figure 4
Visualization of a procedure based on a simplified model to calculate target-ligand binding affinity (Δtp) with the four-body potential.
Figure 5
Figure 5
Scatter plots of experimental versus calculated binding energy for (a) twenty-five HIV-1 protease-inhibitor complexes culled by Jenwitheesuk and Samudrala [25] and (b) a larger set of 140 such complexes which include the initial twenty-five, with the remainder obtained by searching the Binding MOAD database [26, 27]. Both experimental binding energies and crystallographic structures are available for these complexes, and the latter were required for calculating binding energy (Δts) as outlined in the text and in Figure 4.

References

    1. Berman H., Henrick K., Nakamura H., Markley J. L. The worldwide protein data bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Research. 2007;35(supplement 1):D301–D303. doi: 10.1093/nar/gkl971. - DOI - PMC - PubMed
    1. Rykunov D., Fiser A. New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics. 2010;11, article 128 doi: 10.1186/1471-2105-11-128. - DOI - PMC - PubMed
    1. Summa C. M., Levitt M., DeGrado W. F. An atomic environment potential for use in protein structure prediction. Journal of Molecular Biology. 2005;352(4):986–1001. doi: 10.1016/j.jmb.2005.07.054. - DOI - PubMed
    1. Zhang C., Liu S., Zhou H., Zhou Y. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Science. 2004;13(2):400–411. doi: 10.1110/ps.03348304. - DOI - PMC - PubMed
    1. Rykunov D., Fiser A. Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials. Proteins: Structure, Function and Genetics. 2007;67(3):559–568. doi: 10.1002/prot.21279. - DOI - PubMed

Substances

LinkOut - more resources