Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 1;76(Pt 4):385-399.
doi: 10.1107/S2059798320003198. Epub 2020 Mar 31.

Scaling diffraction data in the DIALS software package: algorithms and new approaches for multi-crystal scaling

Affiliations

Scaling diffraction data in the DIALS software package: algorithms and new approaches for multi-crystal scaling

James Beilsten-Edmands et al. Acta Crystallogr D Struct Biol. .

Abstract

In processing X-ray diffraction data, the intensities obtained from integration of the diffraction images must be corrected for experimental effects in order to place all intensities on a common scale both within and between data collections. Scaling corrects for effects such as changes in sample illumination, absorption and, to some extent, global radiation damage that cause the measured intensities of symmetry-equivalent observations to differ throughout a data set. This necessarily requires a prior evaluation of the point-group symmetry of the crystal. This paper describes and evaluates the scaling algorithms implemented within the DIALS data-processing package and demonstrates the effectiveness and key features of the implementation on example macromolecular crystallographic rotation data. In particular, the scaling algorithms enable new workflows for the scaling of multi-crystal or multi-sweep data sets, providing the analysis required to support current trends towards collecting data from ever-smaller samples. In addition, the implementation of a free-set validation method is discussed, which allows the quantification of the suitability of scaling-model and algorithm choices.

Keywords: crystallography; data analysis; diffraction; multi-crystal; scaling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Example of a 1D smooth scaling component. The scale factor at adjusted coordinate x is determined by a Gaussian-weighted average of the nearest three parameters at xi, with the weighting depending on the distances |xxi|. (b) Generalization of smooth scaling in higher dimensions, shown in 2D. The value of the component at adjusted position (x, y) is a Gaussian-weighted average of the nearest three parameters along each dimension, with the weighting depending on the distances d-norm = [(xxi)2 + (yyi)2]1/2.
Figure 2
Figure 2
Flow chart showing the main processes of the default scaling algorithm, which consists of several rounds of model optimization and outlier rejection, as well as optimization of profile and summation intensity combination and adjustment of uncertainty (error) estimates by refining a two-parameter error model.
Figure 3
Figure 3
Flow chart showing the stages of the incremental scaling algorithm, in which an additional prescaling round is performed using the already scaled data as a reference data set.
Figure 4
Figure 4
Flow chart showing the stages of the reference scaling algorithm, which uses the scaling-optimization algorithm with the reference set of intensities in the minimization target.
Figure 5
Figure 5
Scaling-model components determined for the weak but high-multiplicity thermolysin data set. The error bars indicate the standard parameter uncertainties determined from the final minimization cycle. (a) Inverse scale-factor correction for the scale correction Chl (parameter uncertainties are too small to distinguish). (b) Smoothly varying B factor for the decay correction Thl. (c) Values of the spherical harmonic coefficients Plm that define the absorption surface correction. (d) The angular dependence of the absorption surface correction factor Shl in the crystal frame (i.e. Shl plotted for −s 0 = s 1).
Figure 6
Figure 6
Resolution-dependent CC1/2 (a) and 〈I/σ〉 (b) for the scaled data set. (c) Normal probability plot of anomalous differences δanom = (I +I )/[σ2(I +) + σ2(I )]1/2 (d > 2.13 Å). (d) Scatter plot of ΔI anom pairs (ΔI 1 = 〈I +1 − 〈I 1, ΔI 2 = 〈I +2 − 〈I 2 for random splitting of I + and I ) and (e) anomalous correlation ratio for acentric reflections [ratio of r.m.s. deviation along y = x to r.m.s. deviation along y = −x in (d)].
Figure 7
Figure 7
(a) Normal probability plot of the normalized deviations δhl after error-model correction compared with an expected normal distribution (solid line), showing good overall agreement but with some discrepancy for deviations below −2. (b) Comparison of the variance of the normalized deviations, binned by intensity, before and after error-model correction, which reduces the variances close to the target of unity across the intensity range.
Figure 8
Figure 8
Initial exploratory scaling and filtering analysis on a multi-crystal TehA data set on 263 groups of ten images. (a) The distribution of ΔCC1/2 values for the image groups after each round (n) of scaling, with the plots limited to counts of 4 and below to display low-count histogram bins. Groups with ΔCCi 1/2 < 〈ΔCC1/2〉 − 4σ are shown in red and were removed by the filtering algorithm (the 4σ cutoff is indicated by the dashed line). From the ninth cycle, the lowest ΔCCi 1/2 could be considered to be within the tail of the central distribution. (b, c, d) Resolution-averaged CC1/2 σ–τ, 〈I/σ〉 and R p.i.m. per cycle, which show significant improvement in the first seven cycles of scaling and filtering, with more gradual improvement towards the end of the ten cycles.
Figure 9
Figure 9
(a) CC1/2 and 〈I/σ〉 of the selected data set (after eight cycles of scaling and exclusion). (b) The image ranges retained or removed for the selected data set. The whole of the first two sweeps were removed, in addition to the ends of several other sweeps, removing 3.4% of the reflections from the initial data set.
Figure 10
Figure 10
Incremental scaling of 20° rotation sweeps collected in situ. (a) Completeness and (b) correlation coefficient for the number of sweeps in the combined data set. Each sweep is added individually to the combined scaled data set measured up to that point, triggering the reference-scaling algorithm. As each sweep is added, the completeness is monitored until a given completeness is obtained across the resolution range (98% in this example, which is obtained after adding sweep 12).
Figure 11
Figure 11
The difference in the free R meas determined using all reflections and using a subset of reflections [ΔR free = free R meas − free R meas(all)] plotted against the minimum number of groups (top) and the number of reflections in the subset (bottom) for (a, d) single-sweep data sets, (b, e) multiple-sweep data sets and (c, f) a multi-crystal data set. For single sweeps a random group selection is used, whereas for multiple-sweep and multi-crystal data a random (R) and a quasi-random (QR) algorithm are tested. Based on the observed trends, two criteria were chosen for the reflection subset to be used for scaling-model optimization: it must contain at least 2000 groups and 50 000 reflections, as indicated by the vertical dashed lines.

References

    1. Assmann, G., Brehm, W. & Diederichs, K. (2016). J. Appl. Cryst. 49, 1021–1028. - PMC - PubMed
    1. Axford, D., Foadi, J., Hu, N.-J., Choudhury, H. G., Iwata, S., Beis, K., Evans, G. & Alguel, Y. (2015). Acta Cryst. D71, 1228–1237. - PMC - PubMed
    1. Axford, D., Owen, R. L., Aishima, J., Foadi, J., Morgan, A. W., Robinson, J. I., Nettleship, J. E., Owens, R. J., Moraes, I., Fry, E. E., Grimes, J. M., Harlos, K., Kotecha, A., Ren, J., Sutton, G., Walter, T. S., Stuart, D. I. & Evans, G. (2012). Acta Cryst. D68, 592–600. - PMC - PubMed
    1. Blessing, R. H. (1995). Acta Cryst. A51, 33–38. - PubMed
    1. Bourhis, L. J., Dolomanov, O. V., Gildea, R. J., Howard, J. A. K. & Puschmann, H. (2015). Acta Cryst. A71, 59–75. - PMC - PubMed

Substances