Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014:1091:297-314.
doi: 10.1007/978-1-62703-691-7_21.

The quality and validation of structures from structural genomics

Affiliations

The quality and validation of structures from structural genomics

Marcin J Domagalski et al. Methods Mol Biol. 2014.

Abstract

Quality control of three-dimensional structures of macromolecules is a critical step to ensure the integrity of structural biology data, especially those produced by structural genomics centers. Whereas the Protein Data Bank (PDB) has proven to be a remarkable success overall, the inconsistent quality of structures reveals a lack of universal standards for structure/deposit validation. Here, we review the state-of-the-art methods used in macromolecular structure validation, focusing on validation of structures determined by X-ray crystallography. We describe some general protocols used in the rebuilding and re-refinement of problematic structural models. We also briefly discuss some frontier areas of structure validation, including refinement of protein-ligand complexes, automation of structure redetermination, and the use of NMR structures and computational models to solve X-ray crystal structures by molecular replacement.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Average number of missing parameters in the PDB file headers for PSI high-throughput (PSI-HT) centers, structural genomics worldwide (excluding the PSI-HT centers), and traditional structural biology laboratories. A small number of “NULL” values is always present due to generation of PDB file headers—not all parameters are relevant to all kinds of experiments
Fig. 2
Fig. 2
Normalized distribution of high-resolution limits for X-ray structures solved by PSI high-throughput (PSI-HT) centers, structural genomics worldwide excluding PSI-HT, and traditional structural biology laboratories
Fig. 3
Fig. 3
Selected structure quality metrics of all PDB deposits with the same first author (the author was selected randomly from all such authors with >200 deposits). (a) Distribution of R (red) and Rfree (blue) as a function of resolution, along with trendlines as determined by linear regression. (b) Distribution of Molprobity clashscore percentile (ranking of “raw” clashscore relative to other structures in the PDB of similar resolution)
Fig. 4
Fig. 4
Distributions of mean I/σ (I) for the highest resolution shell vs. mean I/σ (I) for all reflections, as determined for different sets of structures in the PDB. (a) Distribution for all structures determined by X-ray crystallography. (b) Distribution for all X-ray structures solved since April 2011. (c) Distribution for all X-ray structures solved since April 2011 by the four high-throughput PSI centers. On all distributions, the conventional threshold of 2.0 of mean I/σ (I) for the highest resolution is marked by a red line. There are a significant number of structures where the two values are identical, as well as a number where the mean I/σ (I) for the highest resolution shell is greater than the mean for all reflections, a physically improbable outcome
Fig. 5
Fig. 5
(a) Distribution of R factor vs. resolution for all X-ray structures deposited in the PDB since April 2011. Structures solved by SG centers are marked in red and structures solved by traditional laboratories are in blue. The lines represent linear regression trend lines for the two sets of structures in the same color scheme. (b) Distribution of Rfree factors vs. resolution for all X-ray structures deposited in the PDB since April 2011, using the same color scheme as part (a)
Fig. 6
Fig. 6
A screen shot of the “Check waters” tool in HKL-3000

References

    1. Berman HM, Westbrook J, Feng Z, et al. The protein data bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Chruszcz M, Domagalski M, Osinski T, et al. Unmet challenges of structural genomics. Curr Opin Struct Biol. 2010;20:587–597. - PMC - PubMed
    1. Grabowski M, Chruszcz M, Zimmerman MD, et al. Benefits of structural genomics for drug discovery research. Infect Disord Drug Targets. 2009;9:459–474. - PMC - PubMed
    1. Karplus PA, Diederichs K. Linking crystallographic model and data quality. Science. 2012;336:1030–1033. - PMC - PubMed
    1. Engh RA, Huber R. Accurate bond and angle parameters for X-ray protein-structure refinement. Acta Crystallogr A. 1991;47:392–400.

Publication types

LinkOut - more resources