Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 1;79(Pt 2):145-162.
doi: 10.1107/S2053273323000682. Epub 2023 Feb 17.

Crystal diffraction prediction and partiality estimation using Gaussian basis functions

Affiliations

Crystal diffraction prediction and partiality estimation using Gaussian basis functions

Wolfgang Brehm et al. Acta Crystallogr A Found Adv. .

Abstract

The recent diversification of macromolecular crystallographic experiments including the use of pink beams, convergent electron diffraction and serial snapshot crystallography has shown the limitations of using the Laue equations for diffraction prediction. This article gives a computationally efficient way of calculating approximate crystal diffraction patterns given varying distributions of the incoming beam, crystal shapes and other potentially hidden parameters. This approach models each pixel of a diffraction pattern and improves data processing of integrated peak intensities by enabling the correction of partially recorded reflections. The fundamental idea is to express the distributions as weighted sums of Gaussian functions. The approach is demonstrated on serial femtosecond crystallography data sets, showing a significant decrease in the required number of patterns to refine a structure to a given error.

Keywords: diffraction prediction; merging; partiality estimation; serial snapshot crystallography.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The geometric construction visualizing the construction of the covariance matrices of the distributions of diffractive power in reciprocal space and the volume probed by an incident beam. The arrows indicate the components, akin to error bars, that the different distributions contribute to the covariance matrix in a 2D cut. The same contributions have a different effect on formula image , formula image and formula image , and where they have an effect they are indicated with the same colour as where they were introduced. The distribution of wavelengths in the incident beam leads to a distribution of lengths of formula image ; the standard deviation is drawn with purple arrows. The distribution of incident-beam directions leads to different starting points of formula image in the Ewald construction; its standard deviation is drawn in red. The scattering power of the crystal is smeared rotationally by mosaicity, drawn with brown arrows, and smeared radially by (a simplified) strain, drawn in cyan. The reciprocal peak shape as depicted in light green is a stylized shape transform, which too will be approximated as a Gaussian. To smooth the prediction over a range of output directions in order to simulate the detector point spread function and facilitate efficient sampling of the signal, a distribution of diffraction directions can be introduced, the standard deviation of which is drawn in dark blue.
Figure 2
Figure 2
Illustration of the effect of divergence or convergence. Multiple (depicted three) incident-beam directions with the same wavelegth all lie on a spherical cap and produce a nest of Ewald spheres.
Figure 3
Figure 3
Geometric explanation for equation (25) for the expected wavenumber. Convergence, orthogonal to formula image , and wavelength dispersion, in line with formula image , are indicated as a box to highlight the shearing of the covariance when forming the correlated difference between formula image and formula image and their respective variances. It can be seen that the length of formula image projected onto formula image is formula image , where formula image is the angle of diffraction.
Figure 4
Figure 4
Comparison between (a) previously published diffraction data from a human serotonin receptor (Liu et al., 2013 ▸) and (b) predicted diffraction of the same image region after successful optimization, with estimated background added. Diffraction is predicted using equation (19) with the substitution formula image , corrected for the solid angle with equations (22), (25) to estimate the expected wavelength and summed up over all significantly excited Miller indices. Intensities are scaled according to the reference intensities deposited in the PDB (Protein Data Bank) under 4NC3 . The bandwidth of the X-ray beam is estimated to be about 0.1% [LCLS states 0.2% ΔE/E FWHM for the CXI beamline (LCLS, 2022 ▸)].
Figure 5
Figure 5
Comparison between (a) diffraction data (unpublished) of selenobiotine-bound streptavidin crystals and (b) predicted diffraction of the same image region with estimated background added. Diffraction is predicted using equation (19) with the substitution formula image , corrected for the solid angle with equations (22), (25) to estimate the expected wavelength and summed up over all significantly excited Miller indices. The diffraction was measured at ESRF with a 1M Jungfrau detector using a pink beam with 5% bandwidth FWHM. The structure factors for the prediction are taken from the streptavidin–norbiotin complex structure deposited under 1LCV in the PDB (Pazy et al., 2002 ▸).
Figure 6
Figure 6
Histogram of measured integrated intensities of data set 1 in black (without overprediction) and red (with overprediction) overlaid with the Cauchy outlier distribution (γ = 1967.7) in blue. The outlier distribution was chosen so as to describe the measurements well, but also to reserve some probability especially for the extreme values. Note that the additional intensities due to overprediction are mostly small.
Figure 7
Figure 7
Predicted intensities versus measured intensities with the photon counting error estimates indicated by blue error bars and corrected error estimates by grey error bars. In red are data points that were treated as outliers, dots in blue were treated as regular data points. The black line shows where the points would lie if the predictions were in perfect agreement with the measurements. (a) shows the first 1000 intensities as recorded in the granulin data set (data set 1). (b) shows the intensities and predictions for the crystal with the strongest diffraction in the same data set.
Figure 8
Figure 8
A scatter plot of a subset of predicted versus measured partialities with an estimated photon counting and background subtraction error of less than 1/8 in the granulin data set (data set 1). Chosen are the first 10 000 intensities from the data set in the order they are recorded, to make the result as reproducible as possible.
Figure 9
Figure 9
Predicted partialities compared with measured partialities, with photon counting error estimates indicated by error bars. The first 993 values from data set 1 in the order they are recorderd to have an estimated photon counting and background subtraction error of less than 1/4 are displayed. The black line shows where the points would lie, if the predictions were in perfect agreement with the measurements.
Figure 10
Figure 10
Histogram of partialities measured with an estimated photon counting and background subtraction error of less than 1/8 from the granulin data set (data set 1).
Figure 11
Figure 11
10 000 random pairs of predicted and measured intensities from the random half data set of data set 1 that was used to to fit all parameters.
Figure 12
Figure 12
10 000 random pairs of predicted and measured intensities using the parameters determined from the random half data set of data set 1 used in Fig. 11 ▸. Note the slightly reduced correlation compared with Fig. 11 ▸.
Figure 13
Figure 13
Comparison of structure refinement results of the granulin data set (data set 1) using phenix 1.18-3855 to a resolution of 1.8 Å of MGPCII, in green, and partialator 0.9, in violet. The bold dots represent the free R factor, the small circles represent the R work. The partiality model ggpm gave the best result for partialator for all sizes of subsets that were tested.
Figure 14
Figure 14
Maximum HySS correlation coefficient found during automatic SAD phasing using phenix.autosol from A2A crystals (Nass, 2020 ▸) as a function of the number of crystals used during merging. The entries in green are for MGPCII, whereas the violet dots represent the results of partialator.
Figure 15
Figure 15
R factors of the refinement of structures built during automatic SAD phasing using phenix.autosol from A2A crystals (Nass, 2020 ▸) as a function of the number of crystals used. The entries in green are for MGPCII, whereas the violet dots represent the results of partialator. The solid dots are R free and the open circles are R work.
Figure 16
Figure 16
An illustration of region growing for identifying reflections with significant contribution to the diffraction. The grey gridlines intersect at integer combinations that are the Miller indices of the reflections in reciprocal space. The Ewald sphere, or diffraction condition more generally, is assumed to be a smooth function and much thinner in one dimension than the others. It is caricaturized with an ellipse sector in black. The algorithm starts at any of the light red or light blue squares. For each blue square that intersects with the diffraction condition at any point, the diffraction condition at the exact Miller index is evaluated. A significant contribution is indicated with a blue dot, an insignificant contribution with a red dot. For each blue square all new neighbours are inspected for intersections in the same manner. Squares that do not intersect the diffraction condition at any point are coloured in light red and do not prompt the inspection of their neighbours.

References

    1. Andrews, S. J., Hails, J. E., Harding, M. M. & Cruickshank, D. W. J. (1987). Acta Cryst. A43, 70–73.
    1. Brehm, W. (2019). INFOCOMP J. Comput. Sci. 18, 20–25.
    1. Brewster, A. S., Bhowmick, A., Bolotovsky, R., Mendez, D., Zwart, P. H. & Sauter, N. K. (2019). Acta Cryst. D75, 959–968. - PMC - PubMed
    1. Broyden, C. G. (1970). IMA J. Appl. Math. 6, 76–90.
    1. Bücker, R., Hogan-Lamarre, P., Mehrabi, P., Schulz, E. C., Bultema, L. A., Gevorkov, Y., Brehm, W., Yefanov, O., Oberthür, D., Kassier, G. H. & Dwayne Miller, R. J. (2020). Nat. Commun. 11, 996. - PMC - PubMed