. 2020 Apr 1;76(Pt 4):385-399.

doi: 10.1107/S2059798320003198. Epub 2020 Mar 31.

Scaling diffraction data in the DIALS software package: algorithms and new approaches for multi-crystal scaling

James Beilsten-Edmands¹, Graeme Winter¹, Richard Gildea¹, James Parkhurst¹, David Waterman², Gwyndaf Evans¹

Affiliations

¹ Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, United Kingdom.
² STFC, Rutherford Appleton Laboratory, Didcot OX11 0FA, United Kingdom.

PMID: 32254063
PMCID: PMC7137103
DOI: 10.1107/S2059798320003198

Scaling diffraction data in the DIALS software package: algorithms and new approaches for multi-crystal scaling

James Beilsten-Edmands et al. Acta Crystallogr D Struct Biol. 2020.

. 2020 Apr 1;76(Pt 4):385-399.

doi: 10.1107/S2059798320003198. Epub 2020 Mar 31.

Authors

James Beilsten-Edmands¹, Graeme Winter¹, Richard Gildea¹, James Parkhurst¹, David Waterman², Gwyndaf Evans¹

Affiliations

¹ Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, United Kingdom.
² STFC, Rutherford Appleton Laboratory, Didcot OX11 0FA, United Kingdom.

PMID: 32254063
PMCID: PMC7137103
DOI: 10.1107/S2059798320003198

Abstract

In processing X-ray diffraction data, the intensities obtained from integration of the diffraction images must be corrected for experimental effects in order to place all intensities on a common scale both within and between data collections. Scaling corrects for effects such as changes in sample illumination, absorption and, to some extent, global radiation damage that cause the measured intensities of symmetry-equivalent observations to differ throughout a data set. This necessarily requires a prior evaluation of the point-group symmetry of the crystal. This paper describes and evaluates the scaling algorithms implemented within the DIALS data-processing package and demonstrates the effectiveness and key features of the implementation on example macromolecular crystallographic rotation data. In particular, the scaling algorithms enable new workflows for the scaling of multi-crystal or multi-sweep data sets, providing the analysis required to support current trends towards collecting data from ever-smaller samples. In addition, the implementation of a free-set validation method is discussed, which allows the quantification of the suitability of scaling-model and algorithm choices.

Keywords: crystallography; data analysis; diffraction; multi-crystal; scaling.

open access.

PubMed Disclaimer

Figures

**Figure 1**
(a) Example of a 1D smooth scaling component. The scale factor at adjusted coordinate x is determined by a Gaussian-weighted average of the nearest three parameters at *x_i*, with the weighting depending on the distances |x − *x_i*|. (b) Generalization of smooth scaling in higher dimensions, shown in 2D. The value of the component at adjusted position (x, y) is a Gaussian-weighted average of the nearest three parameters along each dimension, with the weighting depending on the distances d-norm = [(x − *x_i*)² + (y − *y_i*)²]^1/2.

**Figure 2**
Flow chart showing the main processes of the default scaling algorithm, which consists of several rounds of model optimization and outlier rejection, as well as optimization of profile and summation intensity combination and adjustment of uncertainty (error) estimates by refining a two-parameter error model.

**Figure 3**
Flow chart showing the stages of the incremental scaling algorithm, in which an additional prescaling round is performed using the already scaled data as a reference data set.

**Figure 4**
Flow chart showing the stages of the reference scaling algorithm, which uses the scaling-optimization algorithm with the reference set of intensities in the minimization target.

**Figure 5**
Scaling-model components determined for the weak but high-multiplicity thermolysin data set. The error bars indicate the standard parameter uncertainties determined from the final minimization cycle. (a) Inverse scale-factor correction for the scale correction *C_hl* (parameter uncertainties are too small to distinguish). (b) Smoothly varying B factor for the decay correction *T_hl*. (c) Values of the spherical harmonic coefficients *P_lm* that define the absorption surface correction. (d) The angular dependence of the absorption surface correction factor *S_hl* in the crystal frame (*i.e.* *S_hl* plotted for −s ₀ = s ₁).

**Figure 6**
Resolution-dependent CC_1/2 (a) and 〈I/σ〉 (b) for the scaled data set. (c) Normal probability plot of anomalous differences δ_anom = (I ⁺ − I ⁻)/[σ²(I ⁺) + σ²(I ⁻)]^1/2 (d > 2.13 Å). (d) Scatter plot of ΔI _anom pairs (ΔI ₁ = 〈I ⁺〉₁ − 〈I ⁻〉₁, ΔI ₂ = 〈I ⁺〉₂ − 〈I ⁻〉₂ for random splitting of I ⁺ and I ⁻) and (e) anomalous correlation ratio for acentric reflections [ratio of r.m.s. deviation along y = x to r.m.s. deviation along y = −x in (d)].

**Figure 7**
(a) Normal probability plot of the normalized deviations δ_hl after error-model correction compared with an expected normal distribution (solid line), showing good overall agreement but with some discrepancy for deviations below −2. (b) Comparison of the variance of the normalized deviations, binned by intensity, before and after error-model correction, which reduces the variances close to the target of unity across the intensity range.

**Figure 8**
Initial exploratory scaling and filtering analysis on a multi-crystal TehA data set on 263 groups of ten images. (a) The distribution of ΔCC_1/2 values for the image groups after each round (n) of scaling, with the plots limited to counts of 4 and below to display low-count histogram bins. Groups with ΔCCⁱ _1/2 < 〈ΔCC_1/2〉 − 4σ are shown in red and were removed by the filtering algorithm (the 4σ cutoff is indicated by the dashed line). From the ninth cycle, the lowest ΔCCⁱ _1/2 could be considered to be within the tail of the central distribution. (b, c, d) Resolution-averaged CC_1/2 ^σ–τ, 〈I/σ〉 and R _p.i.m. per cycle, which show significant improvement in the first seven cycles of scaling and filtering, with more gradual improvement towards the end of the ten cycles.

**Figure 9**
(a) CC_1/2 and 〈I/σ〉 of the selected data set (after eight cycles of scaling and exclusion). (b) The image ranges retained or removed for the selected data set. The whole of the first two sweeps were removed, in addition to the ends of several other sweeps, removing 3.4% of the reflections from the initial data set.

**Figure 10**
Incremental scaling of 20° rotation sweeps collected *in situ*. (a) Completeness and (b) correlation coefficient for the number of sweeps in the combined data set. Each sweep is added individually to the combined scaled data set measured up to that point, triggering the reference-scaling algorithm. As each sweep is added, the completeness is monitored until a given completeness is obtained across the resolution range (98% in this example, which is obtained after adding sweep 12).

**Figure 11**
The difference in the free R _meas determined using all reflections and using a subset of reflections [ΔR _free = free R _meas − free R _meas(all)] plotted against the minimum number of groups (top) and the number of reflections in the subset (bottom) for (a, d) single-sweep data sets, (b, e) multiple-sweep data sets and (c, f) a multi-crystal data set. For single sweeps a random group selection is used, whereas for multiple-sweep and multi-crystal data a random (R) and a quasi-random (QR) algorithm are tested. Based on the observed trends, two criteria were chosen for the reflection subset to be used for scaling-model optimization: it must contain at least 2000 groups and 50 000 reflections, as indicated by the vertical dashed lines.

See this image and copyright information in PMC

References

1. Assmann, G., Brehm, W. & Diederichs, K. (2016). J. Appl. Cryst. 49, 1021–1028. - PMC - PubMed
1. Axford, D., Foadi, J., Hu, N.-J., Choudhury, H. G., Iwata, S., Beis, K., Evans, G. & Alguel, Y. (2015). Acta Cryst. D71, 1228–1237. - PMC - PubMed
1. Axford, D., Owen, R. L., Aishima, J., Foadi, J., Morgan, A. W., Robinson, J. I., Nettleship, J. E., Owens, R. J., Moraes, I., Fry, E. E., Grimes, J. M., Harlos, K., Kotecha, A., Ren, J., Sutton, G., Walter, T. S., Stuart, D. I. & Evans, G. (2012). Acta Cryst. D68, 592–600. - PMC - PubMed
1. Blessing, R. H. (1995). Acta Cryst. A51, 33–38. - PubMed
1. Bourhis, L. J., Dolomanov, O. V., Gildea, R. J., Howard, J. A. K. & Puschmann, H. (2015). Acta Cryst. A71, 59–75. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM117126/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Scaling diffraction data in the DIALS software package: algorithms and new approaches for multi-crystal scaling

Affiliations

Scaling diffraction data in the DIALS software package: algorithms and new approaches for multi-crystal scaling

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources