Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 1;11(Pt 2):190-201.
doi: 10.1107/S205225252400054X.

Data reduction in protein serial crystallography

Affiliations

Data reduction in protein serial crystallography

Marina Galchenkova et al. IUCrJ. .

Abstract

Serial crystallography (SX) has become an established technique for protein structure determination, especially when dealing with small or radiation-sensitive crystals and investigating fast or irreversible protein dynamics. The advent of newly developed multi-megapixel X-ray area detectors, capable of capturing over 1000 images per second, has brought about substantial benefits. However, this advancement also entails a notable increase in the volume of collected data. Today, up to 2 PB of data per experiment could be easily obtained under efficient operating conditions. The combined costs associated with storing data from multiple experiments provide a compelling incentive to develop strategies that effectively reduce the amount of data stored on disk while maintaining the quality of scientific outcomes. Lossless data-compression methods are designed to preserve the information content of the data but often struggle to achieve a high compression ratio when applied to experimental data that contain noise. Conversely, lossy compression methods offer the potential to greatly reduce the data volume. Nonetheless, it is vital to thoroughly assess the impact of data quality and scientific outcomes when employing lossy compression, as it inherently involves discarding information. The evaluation of lossy compression effects on data requires proper data quality metrics. In our research, we assess various approaches for both lossless and lossy compression techniques applied to SX data, and equally importantly, we describe metrics suitable for evaluating SX data quality.

Keywords: data compression; data quality evaluation; data reduction; protein serial crystallography.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of electron density maps (2F oF c, contour level σ = 1.5) of lysozyme in the original structure (PDB entry 4et8; 1.90 Å yellow mesh and model) and the reprocessed data using all frames (1.49 Å, blue mesh and model). (a) and (b) Active-site residue Asp52 could be modeled with an alternative conformation using the reprocessed data. (c) and (d) Another section of the structure around Tyr23 with the same maps as described above (but with contour level σ = 0.8) shows more detailed density for the aromatic amino acids when using the reprocessed data.
Figure 2
Figure 2
Data quality metrics CC* (from 0 to 1, higher is better) and R split (up to 100%, lower is better) for the different fractions of the measured dataset of lactamase (first row in Table 1 ▸). The insets show the histogram of achievable resolution for each pattern.
Figure 3
Figure 3
Plot of quality metrics R split and CC* for the original data of thaumatin (see Table 1 ▸), binned and rounded to 1, 2, 3 of the most significant bits. Under the plot, the histograms of found peak intensities over 1/d (peakograms) for different datasets are presented.

References

    1. Adams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, J. J., Hung, L.-W., Kapral, G. J., Grosse-Kunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. & Zwart, P. H. (2010). Acta Cryst. D66, 213–221. - PMC - PubMed
    1. Assmann, G., Brehm, W. & Diederichs, K. (2016). J. Appl. Cryst. 49, 1021–1028. - PMC - PubMed
    1. Ayyer, K., Philipp, H. T., Tate, M. W., Wierman, J. L., Elser, V. & Gruner, S. M. (2015). IUCrJ, 2, 29–34. - PMC - PubMed
    1. Ayyer, K., Yefanov, O. M., Oberthür, D., Roy-Chowdhury, S., Galli, L., Mariani, V., Basu, S., Coe, J., Conrad, C. E., Fromme, R., Schaffer, A., Dörner, K., James, D., Kupitz, C., Metz, M., Nelson, G., Xavier, P. L., Beyerlein, K. R., Schmidt, M., Sarrou, I., Spence, J. C. H., Weierstall, U., White, T. A., Yang, J.-H., Zhao, Y., Liang, M., Aquila, A., Hunter, M. S., Robinson, J. S., Koglin, J. E., Boutet, S., Fromme, P., Barty, A. & Chapman, H. N. (2016). Nature, 530, 202–206. - PMC - PubMed
    1. Bernstein, H. J., Förster, A., Bhowmick, A., Brewster, A. S., Brockhauser, S., Gelisio, L., Hall, D. R., Leonarski, F., Mariani, V., Santoni, G., Vonrhein, C. & Winter, G. (2020). IUCrJ, 7, 784–792. - PMC - PubMed

Publication types