Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 5;25(12):1916-1927.
doi: 10.1016/j.str.2017.10.009. Epub 2017 Nov 22.

Validation of Structures in the Protein Data Bank

Affiliations

Validation of Structures in the Protein Data Bank

Swanand Gore et al. Structure. .

Abstract

The Worldwide PDB recently launched a deposition, biocuration, and validation tool: OneDep. At various stages of OneDep data processing, validation reports for three-dimensional structures of biological macromolecules are produced. These reports are based on recommendations of expert task forces representing crystallography, nuclear magnetic resonance, and cryoelectron microscopy communities. The reports provide useful metrics with which depositors can evaluate the quality of the experimental data, the structural model, and the fit between them. The validation module is also available as a stand-alone web server and as a programmatically accessible web service. A growing number of journals require the official wwPDB validation reports (produced at biocuration) to accompany manuscripts describing macromolecular structures. Upon public release of the structure, the validation report becomes part of the public PDB archive. Geometric quality scores for proteins in the PDB archive have improved over the past decade.

Keywords: 3D macromolecular structure; PDB; biocuration; data archiving; data deposition; structural biology; structure data quality; validation; wwPDB.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
Summary Quality Metrics in the wwPDB Validation Reports Sliders (top) and residue plots (bottom). (A) relatively good structure; (B) relatively poor structure. The solid sliders report on how a given structure ranks relative to all structures in the PDB. The open sliders report on the comparison with structures derived in a similar fashion (X-ray crystallographic structures are compared with other X-ray structures solved at a similar resolution, while NMR and EM structures are ranked relative to other NMR and EM structures in the PDB, respectively). Residue sequence plots flag residues that have unusual geometry features (i.e., bond length, bond angle, Ramachandran, RNA suiteness, or other torsion-angle outliers). Residues are color coded as follows: green, no geometric outliers; yellow, 1 type of outliers; orange 2 types of outliers; red, 3 or more types of outliers; gray, atomic coordinates not available; cyan, atomic coordinates are ill-defined by the NMR ensemble. For X-ray crystal structures, a red dot above a residue indicates a poor fit to electron density (RSRZ > 2).
Figure 2
Figure 2
List of 25 Journals, which Publish Most Papers Describing PDB Structures, Ranked According to Their Citation in the PDB from 2012 to 2016 Journals that require wwPDB validation reports for manuscript review are shown in black, while the ones that do not yet require the reports are shown in gray. Note that obsoleted entries are only included when calculating these statistics if they were superseded by a different PDB entry. Obsoleted (retracted) entries were excluded.
Figure 3
Figure 3
Trends in Geometric Quality Metrics for Protein Structures in the PDB Trends between 1995 and 2016 of geometric validation scores for X-ray crystal and NMR entries in the PDB as reported by MolProbity (Chen et al., 2010). (A–C) Validation metrics for X-ray crystal structures: (A) Ramachandran outliers; (B) rotamer outliers; (C) clashscore. (D–F) Metrics for well-defined regions of Solution NMR structures: (D) Ramachandran outliers; (E) rotamer outliers; (F) clashscore. In each plot, the thick red line represents the median value of each metric for the given year, the box shows the quartile range (25%–75%), and the whiskers show the 1%–99% range. The worst and the best 1% of entries (outside of the whisker range) are plotted as dots.
Figure 4
Figure 4
Trends in Geometric Quality Metrics for Small Molecules in the PDB Trends between 1995 and 2016 of bond length and bond-angle RMSZ metrics as determined by Mogul (Bruno et al., 2004) for small molecules in X-ray crystal structures in the PDB at better than 2.5 Å resolution. (A) ligands with 1–20 non-hydrogen atoms; (B) ligands with 21–40 non-hydrogen atoms; (C) ligands with 41–60 non-hydrogen atoms. In each box plot, the thick red line represents the median value per year, the box shows the interquartile range (25%–75%), and the whiskers show the 1%–99% range. Values outside of the whisker range are plotted as dots. In each plot, the top panel shows the bond length RMSZ metric, the middle panel shows the bond-angle RMSZ metric, and the bottom panel shows the number of such ligands deposited in each year.

References

    1. Adams P.D., Afonine P.V., Bunkoczi G., Chen V.B., Davis I.W., Echols N., Headd J.J., Hung L.-W., Kapral G.J., Grosse-Kunstleve R.W. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 2010;66:213–221. - PMC - PubMed
    1. Adams P.D., Aertgeerts K., Bauer C., Bell J.A., Berman H.M., Bhat T.N., Blaney J.M., Bolton E., Bricogne G., Brown D. Outcome of the first wwPDB/CCDC/D3R ligand validation workshop. Structure. 2016;24:502–508. - PMC - PubMed
    1. Berjanskii M.V., Wishart D.S. A simple method to predict protein flexibility using secondary chemical shifts. J. Am. Chem. Soc. 2005;127:14970–14971. - PubMed
    1. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Berman H.M., Henrick K., Nakamura H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 2003;10:980. - PubMed

Publication types

MeSH terms

LinkOut - more resources