Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct 12;19(10):1395-412.
doi: 10.1016/j.str.2011.08.006.

A new generation of crystallographic validation tools for the protein data bank

Affiliations

A new generation of crystallographic validation tools for the protein data bank

Randy J Read et al. Structure. .

Abstract

This report presents the conclusions of the X-ray Validation Task Force of the worldwide Protein Data Bank (PDB). The PDB has expanded massively since current criteria for validation of deposited structures were adopted, allowing a much more sophisticated understanding of all the components of macromolecular crystals. The size of the PDB creates new opportunities to validate structures by comparison with the existing database, and the now-mandatory deposition of structure factors creates new opportunities to validate the underlying diffraction data. These developments highlighted the need for a new assessment of validation criteria. The Task Force recommends that a small set of validation data be presented in an easily understood format, relative to both the full PDB and the applicable resolution class, with greater detail available to interested users. Most importantly, we recommend that referees and editors judging the quality of structural experiments have access to a concise summary of well-established quality indicators.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Correction of a Local Error for Thr 32 in PDB 1sbp, a Quite Good Older Structure at 1.7Å Resolution (A) This side-chain in 1sbp (He and Quiocho, 1993) has many serious all-atom steric clashes (clusters of red spikes) and no hydrogen bonds, and the tetrahedral angles at N-Cα-Cβ and at Cγ2-Cβ-Oγ1 (labeled) are bad outliers. (B) The side-chain has been turned 180° and now has ideal geometry, no clashes, two good hydrogen bonds, and a slightly better fit to the density.
Figure 2
Figure 2
Ramachandran Distribution of ϕ,ψ Angles (A) The non-Gly, non-Pro distribution used in ProCheck, from about 100,000 residues of unfiltered data, plus the outlines for the ProCheck Favored, Allowed and Generously Allowed regions (taken from Morris et al., 1992). (B–G) The MolProbity-updated data distributions for the VTF-recommended 6 amino-acid categories, from about 825,000 residues after quality-filtering by resolution (<2Å), alternate conformations, and backbone B-factor (<30Å2). In (B–G), the inner contour encloses the favored 98% of the filtered data. The outer contour encloses 99.95% of the filtered data (all but 1 in 2000, or equivalent to 3.5σ), now feasible for individual categories as well as the general (Chen et al., 2010); this contour is taken to divide Ramachandran outliers from allowed conformations. See also Figure S1.
Figure 3
Figure 3
All PDB (X-ray, since 1990) Distribution of Validation Criteria as a Function of Resolution Median and quartile levels are plotted smoothly, along with all individual data points for outlier structures beyond the 1st percentile (poor; red) or the 99th percentile (good; blue) values (see Supplemental Information for detailed criteria, and for procedures and discussion of these shingle-smoothed, quartile-and-outer-percentile plots with outlier datapoints). At the right of each panel is the resolution-independent, 1-D distribution (green line) with median, quartile, and outer percentile values marked, for the aggregated set of all PDB entries. (A) Percent Ramachandran outliers. (B) MolProbity clashscores. (C) Rfree. See also Figures S2 and S3 and Table S1.
Figure 4
Figure 4
RosettaHoles2 Scores (A) RosettaHoles2 scores for crystal structures in the PDB as of May 14th 2010; only structures that contain primarily protein, have no missing nonhydrogen atoms, and are larger than 10 kDa are included. Structures marked in red contain at least one large void surrounded by hydrophobic side chains, 18 structures without such voids are marked with black circles, transmembrane proteins are marked with green squares, and the retracted structure 179L is marked with a cyan diamond. Structures marked as purple triangles have been identified as likely to arise from fabrication. On the right is a histogram of scores for 1.5Å, 2.5Å, and 3.5Å resolution bins. (B) A more detailed examination of the 179L cell parameter error, showing the voids present in 177L and 179L, which are identical except for the error. The degradation of the RosettaHoles2 score and its two components is shown for increasing void sizes.
Figure 5
Figure 5
Histograms Showing Distributions of Validation Criteria Computed with X-Ray Diffraction Data (A) Histogram showing the numbers of structures with different fractions of Wilson outliers. Datasets showing evidence of translational NCS (nonorigin Patterson peak >20% of the origin peak) have been omitted. Note the logarithmic scale on the vertical axis. (B) Histogram showing the numbers of structures for which the data show different levels of relative anisotropy. (C) Histogram showing the numbers of structures with different percentages of residues having RSR-Z > 2 (i.e., much poorer than average fit to density). The good quartile boundary is 1.0, the median is 2.7, the poor quartile boundary is 5.3, and the 1st percentile is 16.3. Approximately 11% of structures have no residues with RSR-Z > 2.
Figure 6
Figure 6
Possible Representations of Validation Metrics (A and B) Slider representation of key validation metrics for the verotoxin-1 B-subunit (Stein et al., 1992) before (A) (PDB entry 1bov) and after (B) (PDB entry 2xsc) a rebuild and rerefinement (Robert D. Oeffner & Gábor Bunkóczi, personal communication). The color scale across each bar represents the percentile score for each metric, with better scores to the right in blue. The solid bars show percentile relative to the entire PDB, whereas the ellipses show percentile relative to structures at similar resolution (2.1Å here). Note that the RSR-Z score is defined only on the all-PDB scale. This structure predates introduction of Rfree (Brünger, 1992); the value reported here was obtained by 10 macrocycles of refinement in phenix.refine (Afonine et al., 2005), after applying a 0.5Å random shift to all atoms. (C) Displaying per-residue validation on a scrollable plot. Outliers are flagged for real-space residual, all-atom clashes, conformation, and covalent-geometry, along with sequence and secondary-structure assignment. One-letter code enables a concise view, with further details shown on mouse-over. This example is from an entry at 2.7Å resolution, with average problems in the core but misfit regions at several transitions between helix and disordered loop. Plot modified from output of the MolProbityCompare utility by Bradley Hintze.

Similar articles

Cited by

References

    1. Adams P.D., Afonine P.V., Bunkóczi G., Chen V.B., Davis I.W., Echols N., Headd J.J., Hung L.-W., Kapral G.J., Grosse-Kunstleve R.W. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 2010;66:213–221. - PMC - PubMed
    1. Afonine P.V., Grosse-Kunstleve R.W., Adams P.D. The Phenix refinement framework. CCP4 Newsletter. 2005;42 contribution 8.
    1. Afonine P.V., Grosse-Kunstleve R.W., Chen V.B., Headd J.J., Moriarty N.W., Richardson J.S., Richardson D.C., Urzhumtsev A., Zwart P.H., Adams P.D. phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics. J. Appl. Cryst. 2010;43:669–676. - PMC - PubMed
    1. Allen F.H. The Cambridge Structural Database: a quarter of a million crystal structures and rising. Acta Crystallogr. B. 2002;58:380–388. - PubMed
    1. Arendall W.B., 3rd, Tempel W., Richardson J.S., Zhou W., Wang S., Davis I.W., Liu Z.-J., Rose J.P., Carson W.M., Luo M. A test of enhancing model accuracy in high-throughput crystallography. J. Struct. Funct. Genomics. 2005;6:1–11. - PubMed

Publication types