Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan;27(1):293-315.
doi: 10.1002/pro.3330. Epub 2017 Nov 27.

MolProbity: More and better reference data for improved all-atom structure validation

Affiliations

MolProbity: More and better reference data for improved all-atom structure validation

Christopher J Williams et al. Protein Sci. 2018 Jan.

Abstract

This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open-source CCTBX portion of the Phenix software system. This improves long-term maintainability and enhances the thorough integration of MolProbity-style validation within Phenix. There is now a complete MolProbity mirror site at http://molprobity.manchester.ac.uk. GitHub serves our open-source code, reference datasets, and the resulting multi-dimensional distributions that define most validation criteria. Coordinate output after Asn/Gln/His "flip" correction is now more idealized, since the post-refinement step has apparently often been skipped in the past. Two distinct sets of heavy-atom-to-hydrogen distances and accompanying van der Waals radii have been researched and improved in accuracy, one for the electron-cloud-center positions suitable for X-ray crystallography and one for nuclear positions. New validations include messages at input about problem-causing format irregularities, updates of Ramachandran and rotamer criteria from the million quality-filtered residues in a new reference dataset, the CaBLAM Cα-CO virtual-angle analysis of backbone and secondary structure for cryoEM or low-resolution X-ray, and flagging of the very rare cis-nonProline and twisted peptides which have recently been greatly overused. Due to wide application of MolProbity validation and corrections by the research community, in Phenix, and at the worldwide Protein Data Bank, newly deposited structures have continued to improve greatly as measured by MolProbity's unique all-atom clashscore.

Keywords: Asn/Gln/His flip; CCTBX; CaBLAM; Top8000; all-atom contact analysis; cis non-proline; electron-cloud hydrogen position.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A, Time course showing strong improvement of MolProbity clashscores for the mid‐resolution half of deposits to the wwPDB from 1993 to mid‐2017. B, The validation “slider” and percentile system on the wwPDB web sites, which includes four criteria from MolProbity, illustrated for the 4pr6 HDV ribozyme at 2.3Å resolution.65
Figure 2
Figure 2
Improved positioning of methyl hydrogens attached to planar rings. White bond vectors show the old, incorrect default and green lines the new result, which uses one of the two preferred orientations and matches the H difference peaks at +2.8σ (blue). From the 1gwe Micrococcus lysodeikticus catalase at 0.88Å resolution.67
Figure 3
Figure 3
The new NQH flip output protocol starts with a simple 180° rotation, which does not give exactly superimposed atoms even for ideal geometry. That offset, and also more severe distortions, can be nearly corrected by two additional moves. A, The head groups of sidechains are often not in plane with their stems, resulting in a large shift of the terminal atoms when the sidechain head is rotated 180° (pink) from the original position (gold). A hinging motion brings the new head position back into the plane of the original. B, The rotated and hinged sidechain (pink) is still not well aligned to the original (gold) within that plane. C, A three‐point rigid dock motion, keeping the same Cα position, results in a final docked sidechain (green) with atoms nearly on top of the original ones (gold), but without added geometric distortion.
Figure 4
Figure 4
Shift of a high‐resolution H difference peak at 3.2σ (blue contours) toward its parent atom from the nuclear position. Trp Hɛ1 of the 1yk4 Pyrococcus abyssi rubredoxin at 0.69Å resolution.68
Figure 5
Figure 5
Parent‐atom‐to‐hydrogen (x‐H) distances. Previous values are in gray for MolProbity nuclear and in black for ShelX/Phenix electron‐cloud center. New data sources are in dark green for CSD nuclear, lighter green for CSD X‐ray, gold for QM sphere‐fit, yellow for PDB H peaks, and red for COD adjusted (our most reliable e‐cloud values). Individual datapoints are in brown for NMR and in purple for electron diffraction. Our final adopted values are plotted as circles with an ESD radius, 0.05Å for SH and 0.02Å for all other atom‐pair types.
Figure 6
Figure 6
Clashscore vs resolution, for the Top8000 high‐quality reference dataset (see above). A, Clashscore for each structure by the previous MolProbity system (red), where few datapoints are at or just above zero. B, Clashscores for the same structures in the present MolProbity system (blue), where the scores do asymptote satisfactorily to zero.
Figure 7
Figure 7
All‐atom contact analysis. A,B, Histidine “flip” from clashing to good H‐bonds; 1bkr His42 at 1.1Å.69 C,D, A peak originally fit as water, with clashes to nearby carboxyl oxygens, rebuilt as a sodium ion before deposition as 1xk8 at 2.7Å.70 E,F, An Arg guanidinium next to an RNA phosphate but making no H‐bonds, then as flipped over to a better position; 1s72 Arg 16 of ribosomal protein L3 at 2.4Å resolution.71
Figure 8
Figure 8
The six Ramachandran plots currently used for backbone ϕ,ψ validation by MolProbity, Phenix, and the wwPDB: general case, Ile/Val, Gly, pre‐Pro, trans Pro, and cis Pro. Based on a million quality‐filtered residues in the Top8000 dataset.
Figure 9
Figure 9
Cis‐nonProline and twisted peptides. A, Time course for percent of PDB deposits each year with ≥ 30‐fold too many cis‐nonProline peptides, in 3 phases: first low, then high for 10 years and, after recognition, now abruptly decreasing. B, MolProbity graphics markup for cis‐nonPro (lime green) and for twisted peptides (>30°, in yellow), with the twist line emphasized.
Figure 10
Figure 10
CaBLAM outlier and secondary‐structure diagnosis for 2o01, a large membrane protein at 3.4Å resolution.72 Datapoints (black) for “disguised” helix residues plotted on A, the α − α (μin‐μout) projection and B, the α‐CO (μin‐ν) projection of the 3D CaBLAM plot contoured for general‐case reference data. These points are nearly all inside the red 2‐D contours for helix diagnosis (which are distinct from the green β contours), but about half are shown to be misfit outliers in the 3‐D space, along the CO dihedral axis. C, Details for the distorted model of a particular α‐helix. All nine residues have legal Cα dihedrals which score as helix with good probability, in spite of D, 5 out of 8 COs pointed in the wrong direction (hotpink and purple markup) and only one α‐helical H‐bond.
Figure 11
Figure 11
A, Key to MolProbity graphics markup for contacts and validation outliers. CaBLAM and non‐trans peptide markups are new. B, An example of the new three‐color system in the sortable html chart, and of the new non‐trans peptide reports in the right‐hand column, for 1qw9.73 Hotpink cells flag validation outliers, as before; pale pink cells are allowed but disfavored, and red cells are for extreme outliers. The single outlier in the rightmost column is Gly 73 cis‐nonPro; it is one of the rare valid ones, with excellent electron density and at the active site. Overall, however, this structure has more validation issues than usual at 1.2Å resolution. [Note that for large structures such as this, the chart default is to show only residues with an outlier.].
Figure 12
Figure 12
Virtual backbone dihedral angles in CaBLAM: μin (blue) and μout (green) defined by four successive Cα atoms, and ν (pink) to relate the direction between two successive carbonyl oxygens.
Figure 13
Figure 13
A new RNA backbone suite conformer (named 3h), its recognition as valid aided by ERRASER calculations for this and related structures. This example forms an extended helix junction in the 3gx5 SAM riboswitch at 2.4Å resolution.74 2mFo‐DFc electron density at 1.2σ (gray) and 3σ (purple).

References

    1. Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) ProCheck: a program to check the stereochemical quality of protein structures. J Appl Cryst 26:283–291.
    1. Hooft RWW, Vriend G, Sander C, Abola EE (1996) Errors in protein structures. Nature 381:272–272. - PubMed
    1. Kleywegt GJ, Jones TA (1996) Efficient rebuilding of protein structures. Acta Cryst 52:829–832. - PubMed
    1. Yeates TO (1997) Detecting and overcoming crystal twinning. Methods Enzymol 276:344–358. - PubMed
    1. Zwart PH, Grosse‐Kunstleve RW, Adams PW (2005) Xtriage and Fest: automatic assessment of X‐ray data and substructure structure factor estimation. CCP4 Newsletter, Winter, Contribution 7.

Publication types