Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009;4(1):e4203.
doi: 10.1371/journal.pone.0004203. Epub 2009 Jan 15.

Similarity measures for protein ensembles

Affiliations

Similarity measures for protein ensembles

Kresten Lindorff-Larsen et al. PLoS One. 2009.

Abstract

Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Representative structures from three ensembles generated using molecular dynamics simulations.
These ensembles of the GB1 domain of protein G were obtained using MD simulations in the presence of mass-weighted harmonic restraints of increasing size. The three ensembles A, B and C were obtained using force constants 0.1, 0.01 and 0.001, respectively.
Figure 2
Figure 2. Comparison of the three test ensembles using a method that quantifies the co-occurrence of structures during conformational clustering.
A: Populations of each of the ensembles A, B and C in each of the 12 clusters that we obtained using a cluster preference of −10. B: Jensen-Shannon divergence between the three ensembles at a series of cluster preferences giving rise to between 2 and 7500 clusters.
Figure 3
Figure 3. Comparison of the three test ensembles using a method which involves dimensionality reduction and kernel density estimation.
A: Average residual stress according to Eq. 8 over 5 independent SPE projections of the three test ensembles. The standard deviation is smaller than the symbols shown. B: Example of a two-dimensional projection of the three ensembles. Each point represents an individual conformation, and the distance between each point is locally approximately the same as the RMSD between those two conformations. The two axes represent the two dimensions in the subspace of the SPE projection. C: Contour plots of the two-dimensional kernel estimates of the densities corresponding to the points in panel B. The grey bars next to the plots indicate the scale of the probability densities. D: Average and standard deviation of the Jensen-Shannon divergence between the three ensembles calculated using the kernel density estimates. The results are shown for different values of the dimensionality of the projections.
Figure 4
Figure 4. Six ensembles of the GB1 domain of protein G.
The reference ensemble was obtained using molecular dynamics simulations, and was used to generate a set of synthetic pseudo-experimental distance restraints. These restraints were subsequently used in either single-conformer refinement (formula image) or ensemble refinement using ensemble sizes formula image, 4, 8 and 16. All non-hydrogen atoms are shown in ten structures from each ensemble.
Figure 5
Figure 5. Examination of how well a reference ensemble can be recovered using ensemble simulations.
The results shown here were obtained using the clustering method described in the text. A: Populations of each of the ensembles (MD-reference and ensembles obtained using NOE restraints) in each of the 8 clusters found using the affinity propagation clustering algorithm with a cluster preference of −20. B: Jensen-Shannon divergence between the reference ensemble and the ensembles obtained using NOE restraints applied to different ensemble sizes (formula image). The results are shown for five representative values of the total number of clusters.
Figure 6
Figure 6. Examination of how well a reference ensemble can be recovered using ensemble simulations.
The results shown here were obtained using the projection method described in the text. A: Average residual stress according to Eq. 8 over 10 independent SPE projections of the six ensembles (MD-reference ensemble and five ensembles obtained from NOE restraints). The standard deviation is smaller than the symbols shown. B: Example of the two-dimensional kernel estimates of the densities. The grey bars next to the plots indicate the scale of the probability densities. C: Average and standard deviation of the Jensen-Shannon divergence (formula image) between the reference ensemble and the ensembles obtained using NOE restraints applied to different ensemble sizes (formula image). The results are shown for different values of the dimensionality of the projections and are the averages over 10 independent runs of the SPE algorithm.

Similar articles

Cited by

References

    1. Karplus M, McCammon JA. Molecular dynamics simulations of biomolecules. Nature Struct Biol. 2002;9:646–652. - PubMed
    1. Scheek RM, Torda AE, Kemmink J, van Gunsteren WF. Structure determination by NMR: The modeling of NMR parameters as ensemble averages. In: Hoch JC, Redfield C, Poulsen FM, editors. New York, USA: Plenum Press; 1991. pp. 209–217.
    1. Kuriyan J, et al. Exploration of disorder in protein structures by X-ray restrained molecular dynamics. Proteins. 1991;10:340–358. - PubMed
    1. DePristo MA, de Bakker PI, Blundell TL. Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography. Structure. 2004;12:831–838. - PubMed
    1. Lindorff-Larsen K, Best RB, DePristo MA, Dobson CM, Vendruscolo M. Simultaneous determination of protein structure and dynamics. Nature. 2005;433:128–132. - PubMed