Better models by discarding data?
- PMID: 23793147
- PMCID: PMC3689524
- DOI: 10.1107/S0907444913001121
Better models by discarding data?
Abstract
In macromolecular X-ray crystallography, typical data sets have substantial multiplicity. This can be used to calculate the consistency of repeated measurements and thereby assess data quality. Recently, the properties of a correlation coefficient, CC1/2, that can be used for this purpose were characterized and it was shown that CC1/2 has superior properties compared with `merging' R values. A derived quantity, CC*, links data and model quality. Using experimental data sets, the behaviour of CC1/2 and the more conventional indicators were compared in two situations of practical importance: merging data sets from different crystals and selectively rejecting weak observations or (merged) unique reflections from a data set. In these situations controlled `paired-refinement' tests show that even though discarding the weaker data leads to improvements in the merging R values, the refined models based on these data are of lower quality. These results show the folly of such data-filtering practices aimed at improving the merging R values. Interestingly, in all of these tests CC1/2 is the one data-quality indicator for which the behaviour accurately reflects which of the alternative data-handling strategies results in the best-quality refined model. Its properties in the presence of systematic error are documented and discussed.
Keywords: R value; correlation coefficient; data quality; model quality; outlier rejection.
Figures



References
-
- Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. - PubMed
-
- Arndt, U. W., Crowther, R. A. & Mallett, J. F. W. (1968). J. Phys. E Sci. Instrum. 1, 510–516. - PubMed
-
- Diederichs, K. & Karplus, P. A. (1997). Nature Struct. Biol. 4, 269–275. - PubMed
-
- Diederichs, K. & Karplus, P. A. (2013). In Advancing Methods for Biomolecular Crystallography, edited by R. Read, A. G. Urzhumtsev & V. Y. Lunin. New York: Springer-Verlag.