Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 29;47(Pt 3):1118-1131.
doi: 10.1107/S1600576714007626. eCollection 2014 Jun 1.

Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data

Affiliations

Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data

Anton Barty et al. J Appl Crystallogr. .

Abstract

The emerging technique of serial X-ray diffraction, in which diffraction data are collected from samples flowing across a pulsed X-ray source at repetition rates of 100 Hz or higher, has necessitated the development of new software in order to handle the large data volumes produced. Sorting of data according to different criteria and rapid filtering of events to retain only diffraction patterns of interest results in significant reductions in data volume, thereby simplifying subsequent data analysis and management tasks. Meanwhile the generation of reduced data in the form of virtual powder patterns, radial stacks, histograms and other meta data creates data set summaries for analysis and overall experiment evaluation. Rapid data reduction early in the analysis pipeline is proving to be an essential first step in serial imaging experiments, prompting the authors to make the tool described in this article available to the general community. Originally developed for experiments at X-ray free-electron lasers, the software is based on a modular facility-independent library to promote portability between different experiments and is available under version 3 or later of the GNU General Public License.

Keywords: free-electron lasers; serial X-ray diffraction; serial crystallography.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) ‘Raw’ non-interpolated layout of detector data (in this case for the CSPAD detector) with well defined module boundaries as internally represented for data processing. (b) ‘Assembled’ layout of the same modules as mounted on the physical detector system. A pixel map containing the coordinates of each data pixel in a suitably defined laboratory coordinate system is used to map between data in ‘raw’ layout and pixel locations in physical space.
Figure 2
Figure 2
Virtual powder diffraction data (see §2.5.3) from many lysozyme nanocrystals in (a) the ‘raw’ non-interpolated layout of detector data and (b) the ‘assembled’ layout. Assembling a physically correct image requires interpolation of the raw data onto a regular pixel grid and results in irregular locations of individual module boundaries, due to the moveable central hole and mechanical tolerances in the placement of individual modules. The gaps between detector modules need to be accounted for in the analysis, and module geometry may be refined during subsequent analysis. For these reasons data analysis is performed in raw layout whenever possible.
Figure 3
Figure 3
Frames identified as non-hits are added to a buffer of n images depth. A pixel-wise median through this buffer estimates the current photon background signal. Hot pixels and the standard deviation of background intensity are calculated from the same buffer.
Figure 4
Figure 4
In the case of data from crystalline samples forming well defined Bragg peaks, the local background in the vicinity of a pixel is estimated as the median of pixel values in a box of side length 2r + 1 either side of the pixel of interest. For small peaks (a), the median of pixels within this box serves as a reasonable blind estimate of the background signal. However, when the peak becomes large compared to the box size (b), a simple median no longer serves to estimate the background alone. For our experiments to date, we have found that the number of pixels in the box should be at least three times the number of pixels in the peak.
Figure 5
Figure 5
Comparison of results from running background subtraction and local background subtraction for crystalline samples flowing in a water jet. (a) Image after subtraction of a water ring averaged over multiple frames; fluctuations in pulse intensity and water jet structure result in imperfect background subtraction using running background subtraction. (b) Subtraction of local background using a moving median filter of width 7 pixels produces a cleaner image for peak detection.
Figure 6
Figure 6
Cleaned and assembled image data of diffraction from a single-protein nanocrystal, as viewed in the viewer provided for reviewing Cheetah output.
Figure 7
Figure 7
Statistics on hits identified in a given run in the form of hit rate (a) and distribution of resolution (b).
Figure 8
Figure 8
(a) One individual frame of background-corrected diffraction from a single lysozyme nanocrystal at the LCLS; and (b) the virtual powder diffraction pattern formed by summing of many thousand individual background-corrected frames.
Figure 9
Figure 9
Radial stacks summarize the radially averaged signal for each frame (a) prior to normalization for shot-to-shot variation and (b) after normalization and outlier rejection. Radial stacks are used for WAXS/SAXS analysis and for comparing powder diffraction patterns on a shot-by-shot basis, and when sorted by laser delay or other reaction coordinates facilitate data evaluation in time-resolved studies.
Figure 10
Figure 10
Cheetah implementation is multi-tiered. At the top level (a), Cheetah contains programs that interface to facility-dependent file formats and real-time data streams, translating and repackaging data from facility data formats for use by the Cheetah processing engine (b). Adaptation of this front end is all that is required to implement Cheetah with other facility data systems and file formats. The processing engine (b) is written in a facility-independent manner and compiled as a callable library, whilst core low-level functions (c) are implemented in plain C wherever possible to facilitate reuse of individual modules.
Figure 11
Figure 11
Control panel for Cheetah operation, showing currently available data sets, the progress of data processing and commonly used functions such as data reviewing.
Figure 12
Figure 12
Event selection in serial X-ray diffraction experiments, borrowing from terminology used in particle physics experiments. Level-1 veto uses external diagnostics to determine whether a sample has been intersected by the X-ray pulse, while Level-2 vetoing relies on readout of only a portion of the detector. Level-3 event filters work in parallel, performing rapid analysis of the entire detector data to determine whether a particular event is worthy of retention for further analysis. Cheetah currently performs the role of a Level-3 event filter in addition to performing data reduction tasks.

References

    1. Barends, T. R. M. et al. (2013). Nature, 505, 244–247. - PubMed
    1. Boutet, S. et al. (2012). Science, 337, 362–364. - PubMed
    1. Bystricky, J., Calvet, D., Ernwein, J., Gachelin, O., Hansl-Kozanecka, T., Hubbard, J. R., Huet, M., Le Du, P., Mandjavidze, I. & Mur, M. (1997). IEEE Trans. Nucl. Sci. 44, 342–347.
    1. Chapman, H. N. et al. (2011). Nature, 470, 73–77. - PMC - PubMed
    1. Elser, V. (2009). IEEE Trans. Inf. Theory, 55, 4715–4722.