Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan;27(1):41-50.
doi: 10.1002/pro.3249. Epub 2017 Aug 11.

Ensemblator v3: Robust atom-level comparative analyses and classification of protein structure ensembles

Affiliations

Ensemblator v3: Robust atom-level comparative analyses and classification of protein structure ensembles

Andrew E Brereton et al. Protein Sci. 2018 Jan.

Abstract

Ensembles of protein structures are increasingly used to represent the conformational variation of a protein as determined by experiment and/or by molecular simulations, as well as uncertainties that may be associated with structure determinations or predictions. Making the best use of such information requires the ability to quantitatively compare entire ensembles. For this reason, we recently introduced the Ensemblator (Clark et al., Protein Sci 2015; 24:1528), a novel approach to compare user-defined groups of models, in residue level detail. Here we describe Ensemblator v3, an open-source program that employs the same basic ensemble comparison strategy but includes major advances that make it more robust, powerful, and user-friendly. Ensemblator v3 carries out multiple sequence alignments to facilitate the generation of ensembles from non-identical input structures, automatically optimizes the key global overlay parameter, optionally performs "ensemble clustering" to classify the models into subgroups, and calculates a novel "discrimination index" that quantifies similarities and differences, at residue or atom level, between each pair of subgroups. The clustering and automatic options mean that no pre-knowledge about an ensemble is required for its analysis. After describing the novel features of Ensemblator v3, we demonstrate its utility using three case studies that illustrate the ease with which complex analyses are accomplished, and the kinds of insights derived from clustering into subgroups and from the detailed information that locates significant differences. The Ensemblator v3 enhances the structural biology toolbox by greatly expanding the kinds of problems to which this ensemble comparison strategy can be applied.

Keywords: NMR ensemble; Rosetta; clustering; ensemble clustering; protein structure comparison; python; structure prediction; superposition; template-based modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Analysis of the solution structure of RNase Sa. (A) Discrimination Index (DI) plots for the pairwise comparisons of the three groups identified by the Ensemblator. The residue‐based global DI (blue) and the local DI (green) are averaged to create the unified DI (red). The median unified DI is also indicated (black line). (B) Wire‐diagram tracing of the backbone path in the region of largest inter‐group difference (residues 44–49): Group 1 (blue; models 1,2,7,8,10,13–15); Group 2 (green; models 3–6,9,11,12); Group 3 (red; models 16–20). (C) Wire‐diagram as in (B), for groups identified by analysis of only residues 38–58: Group 1 (blue; models 3–7,9,12,16–20); Group 2 (red; models 1,2,8,10,11,13–15). The tighter backbone spread results from the more local overlay. (D) φ,ψ values for residues 46 (circles), 47 (squares), and 48 (triangles) representative of the three groups shown in panel (B) (blue, green, red) and the X‐ray structures (purple). The ±30° boxes indicate the areas used in Protein Geometry Database11 searches for tripeptides present in structures solved at 1.5‐Å resolution or better that have no more than 25% sequence identity to one another. The tripeptide conformation in all the X‐ray models was found 467 times (0.34% of all tripeptides), while zero occurrences were found for the NMR conformations.
Figure 2
Figure 2
Analysis of a mixed‐source ensemble of the FK506 binding protein (FKBP). (A) t‐SNE dimensionality reduction results showing a 2D visualization of the relationships between the models in the N‐dimensional space used to cluster them. Per the key, the shape of each point represents the original label for a given model, and the clusters are differentiated by color (1—blue, 2—green, 3—red). (B) Backbone RMSDs along the chain for the final set of X‐ray (blue), NMR (green), or Rosetta (red) produced models. The bars indicate positions of β‐strands (purple), and α/3–10 helices (orange). (C) Discrimination Index (DI) plots for the Rosetta models vs. the X‐ray models. Residue‐based global (blue), local (green), and unified (black) DI are shown, along with the median unified DI (horizontal black line). Secondary structure indicated as in (B). (D) Wire‐diagram tracing the backbone for the X‐ray (blue), the NMR (green) and the Rosetta (red) models. The N‐ and C‐terminal are indicated, as well as the position of residue 67, at the base of an α‐helix. (E) The φ,ψ‐angles for serine 67 in the Rosetta (red) and the X‐ray structures (blue) are shown. As context, the φ,ψ‐values of all serine residues in crystal structures at 1.5 Å resolution or better with ≤25% sequence identity to one another are indicated (black dots).
Figure 3
Figure 3
Ensemblator analysis of calmodulin (CaM) crystal structures. (A) Wire‐diagram backbone tracing for the ligand‐bound models (blue), and the ligand‐free models (red), as overlayed by the Ensemblator. (B) Discrimination indices (top panel; global (blue), local (green), unified (black), and median unified (horizontal black line)), and RMSDs from the global (middle panel) and local (bottom panel) comparisons for the entire CaM protein. In the global and local comparisons, the within group variation is shown for the ligand‐bound (green) and ligand‐free (blue) conformations. Also indicated is the inter‐group variation (black) and the closest approach distances (grey). (C) As in (B), except the analysis only included the N‐terminal domain. (D) As in (B), except the analysis only included the C‐terminal domain.

Similar articles

Cited by

References

    1. Elber R, Karplus M (1987) Multiple conformational states of proteins: a molecular dynamics analysis of myoglobin. Science 235:318–321. - PubMed
    1. Furnham N, Blundell TL, DePristo MA, Terwilliger TC (2006) Is one solution good enough? Nat Struct Mol Biol 13:184–185. - PubMed
    1. Lindorff‐Larsen K, Best RB, DePristo MA, Dobson CM, Vendruscolo M (2005) Simultaneous determination of protein structure and dynamics. Nature 433:128–132. - PubMed
    1. Monzon AM, Rohr CO, Fornasari MS, Parisi G (2016) CoDNaS 2.0: a comprehensive database of protein conformational diversity in the native state. Database J Biol Databases Curation 29:2512–2514. - PMC - PubMed
    1. Palopoli N, Monzon AM, Parisi G, Fornasari MS (2016) Addressing the role of conformational diversity in protein structure prediction. Plos One 11:e0154923. - PMC - PubMed

Publication types

LinkOut - more resources