Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020:18:2228-2236.
doi: 10.1016/j.csbj.2020.08.013. Epub 2020 Aug 18.

An information gain-based approach for evaluating protein structure models

Affiliations

An information gain-based approach for evaluating protein structure models

Guillaume Postic et al. Comput Struct Biotechnol J. 2020.

Abstract

For three decades now, knowledge-based scoring functions that operate through the "potential of mean force" (PMF) approach have continuously proven useful for studying protein structures. Although these statistical potentials are not to be confused with their physics-based counterparts of the same name-i.e. PMFs obtained by molecular dynamics simulations-their particular success in assessing the native-like character of protein structure predictions has lead authors to consider the computed scores as approximations of the free energy. However, this physical justification is a matter of controversy since the beginning. Alternative interpretations based on Bayes' theorem have been proposed, but the misleading formalism that invokes the inverse Boltzmann law remains recurrent in the literature. In this article, we present a conceptually new method for ranking protein structure models by quality, which is (i) independent of any physics-based explanation and (ii) relevant to statistics and to a general definition of information gain. The theoretical development described in this study provides new insights into how statistical PMFs work, in comparison with our approach. To prove the concept, we have built interatomic distance-dependent scoring functions, based on the former and new equations, and compared their performance on an independent benchmark of 60,000 protein structures. The results demonstrate that our new formalism outperforms statistical PMFs in evaluating the quality of protein structural decoys. Therefore, this original type of score offers a possibility to improve the success of statistical PMFs in the various fields of structural biology where they are applied. The open-source code is available for download at https://gitlab.rpbs.univ-paris-diderot.fr/src/ig-score.

Keywords: Knowledge-based scoring functions; Model quality assessment; Protein structure prediction; Statistical potentials.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Examples of protein models correctly and incorrectly ranked with the information-gain based approach, TIG. For each example, the better and worse models are represented in blue and red, respectively. (A) Predicted structures of the CASP13 target T1006 (magnetosome protein MamM) correctly ranked by TIG, but incorrectly ranked by the PMF, mock, and DOPE scoring functions. (B) Decoy structures of the ATP-binding subunit ClpC1 of the Clp protease (PDB code 3wdeA) from the 3DRobot dataset, which are correctly ranked by all methods except TIG. (C) Predicted structures of the target T0971 (terfestatin biosynthesis enzyme TerC), for which only TIG fails. (D) Decoy structures of the DUB domain of the human zinc metalloprotease AMSH-LP (PDB code 2znrA), for which only TIG succeeds. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
Predicted quality (TIG score) of decoy structures from 3DRobot plotted against their true quality (TM-score). The Pearson correlation coefficient r is given for each example. (A) Conserved domain of nonstructural protein 3 (nsP3) from SARS coronavirus (PDB code 2acfA; 182 residues). (B) Dihydroneopterin aldolase from Escherichia coli (PDB code 2o90A; 122 residues). (C) Catalytic domain of the DNA glycosylase MutY (PDB code 1munA; 225 residues). (D) Protoglobin from Methanosarcina acetivorans (PDB code 3qzxA; 195 residues).
Fig. 3
Fig. 3
Score profiles from the TIG (blue) and PMF (green) methods. The interacting atoms are the Cα of the (A) Cys-Cys, (B) Asp-Glu, (C) Val-Val, and (D) Lys-Arg residue pairs. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Similar articles

Cited by

References

    1. Anfinsen C.B. Principles that govern the folding of protein chains. Science. 1973;181:223–230. - PubMed
    1. Sippl M.J. Calculation of conformational ensembles from potentials of mena force: an approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990;213:859–883. doi: 10.1016/S0022-2836(05)80269-4. - DOI - PubMed
    1. Koppensteiner W.A., Sippl M.J. Knowledge-based potentials–back to the roots. Biochem Biokhimiia. 1998;63:247–252. - PubMed
    1. Sippl M.J., Ortner M., Jaritz M., Lackner P., Flöckner H. Helmholtz free energies of atom pair interactions in proteins. Fold Des. 1996;1:289–298. doi: 10.1016/S1359-0278(96)00042-9. - DOI - PubMed
    1. Zhao F., Li S., Sterner B.W., Xu J. Discriminative learning for protein conformation sampling. Proteins Struct Funct Bioinforma. 2008;73:228–240. doi: 10.1002/prot.22057. - DOI - PMC - PubMed

LinkOut - more resources