Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 18;49(Pt 3):1021-1028.
doi: 10.1107/S1600576716005471. eCollection 2016 Jun 1.

Identification of rogue datasets in serial crystallography

Affiliations

Identification of rogue datasets in serial crystallography

Greta Assmann et al. J Appl Crystallogr. .

Abstract

Advances in beamline optics, detectors and X-ray sources allow new techniques of crystallographic data collection. In serial crystallography, a large number of partial datasets from crystals of small volume are measured. Merging of datasets from different crystals in order to enhance data completeness and accuracy is only valid if the crystals are isomorphous, i.e. sufficiently similar in cell parameters, unit-cell contents and molecular structure. Identification and exclusion of non-isomorphous datasets is therefore indispensable and must be done by means of suitable indicators. To identify rogue datasets, the influence of each dataset on CC1/2 [Karplus & Diederichs (2012 ▸). Science, 336, 1030-1033], the correlation coefficient between pairs of intensities averaged in two randomly assigned subsets of observations, is evaluated. The presented method employs a precise calculation of CC1/2 that avoids the random assignment, and instead of using an overall CC1/2, an average over resolution shells is employed to obtain sensible results. The selection procedure was verified by measuring the correlation of observed (merged) intensities and intensities calculated from a model. It is found that inclusion and merging of non-isomorphous datasets may bias the refined model towards those datasets, and measures to reduce this effect are suggested.

Keywords: CC1/2; isomorphism; model bias; non-isomorphism; outlier identification; precision; serial crystallography.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histogram of ΔCC1/2_i values for PepT. The −28.8σ unit outlier is indicated with an arrow.
Figure 2
Figure 2
Plot of ΔCCFOC_i against ΔCC1/2_i for PepT. The −28.8 8σ unit outlier (ΔCC1/2_i ≃ −4.8 × 10−4) is boxed.
Figure 3
Figure 3
Histogram of ΔCC1/2_i values for AlgE. The −14.8σ unit outlier is indicated with an arrow.
Figure 4
Figure 4
Plot of ΔCCFOC_i against ΔCC1/2_i for AlgE. Different colours and marker symbols refer to the different random shifts of the atom coordinates. Arrows indicate the change of ΔCCFOC_i upon increasing the magnitude of random shifts for the three most significant outliers of the Gaussian distribution of Fig. 3 ▸.

References

    1. Adams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948–1954. - PubMed
    1. Blundell, T. L. & Johnson, L. N. (1976). Protein Crystallography. New York: Academic Press.
    1. Brehm, W. & Diederichs, K. (2014). Acta Cryst. D70, 101–109. - PubMed
    1. CCP4 Bulletin Board (2015). Thread ‘Negative CCanom’; 25 messages between 16 July and 23 July 2015, retrieved from https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1507&L=CCP4BB&....
    1. Chapman, H. N. et al. (2011). Nature, 470, 73–77. - PMC - PubMed