Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;214(3):107875.
doi: 10.1016/j.jsb.2022.107875. Epub 2022 Jun 17.

Precision requirements and data compression in CryoEM/CryoET

Affiliations

Precision requirements and data compression in CryoEM/CryoET

Adam C Fluty et al. J Struct Biol. 2022 Sep.

Abstract

With larger, higher speed detectors and improved automation, individual CryoEM instruments are capable of producing a prodigious amount of data each day, which must then be stored, processed and archived. While it has become routine to use lossless compression on raw counting-mode movies, the averages which result after correcting these movies no longer compress well. These averages could be considered sufficient for long term archival, yet they are conventionally stored with 32 bits of precision, despite high noise levels. Derived images are similarly stored with excess precision, providing an opportunity to decrease project sizes and improve processing speed. We present a simple argument based on propagation of uncertainty for safe bit truncation of flat-fielded images combined with lossless compression. The same method can be used for most derived images throughout the processing pipeline. We test the proposed strategy on two standard, data-limited CryoEM data sets, demonstrating that these limits are safe for real-world use. We find that 5 bits of precision is sufficient for virtually any raw CryoEM data and that 8-12 bits is sufficient for intermediate averages or final 3-D structures. Additionally, we detail and recommend specific rules for discretization of data as well as a practical compressed data representation that is tuned to the specific needs of CryoEM.

Keywords: CryoET; Data compression; Hdf5; Image processing; cryoEM.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1.
Figure 1.
Measurement of the maximum potential impact of real-space representational error in Fourier space using FSC curves when compression is applied to random Gaussian noise. A) Bit truncation and B) JPEG compression.
Figure 2.
Figure 2.
Comparison of full reconstruction volumes (upper) and a selected subarea (lower) with previously published atomic model of β-gal from reconstructions with micrographs after A) no compression, B) 5 bit discretization, C) 3 bit discretization, and D) JPEG 68 compression. Comparison of full reconstruction volumes (upper) and a selected subarea (lower) from TrpV1 reconstructions with micrographs after E) no compression, F) 5 bit discretization, and G) 3 bit discretization. The black-colored surfaces indicate the location of the sub-volume in the full map. Iso-surface colors correspond to the line colors in Fig. 3–4.
Figure 3.
Figure 3.
The effect of various levels of bit truncation and JPEG compression on even-odd Fourier shell correlation (FRC) of 3-D reconstructions for A) β-gal after different levels of bit truncation and B) β-gal after different levels of JPEG compression, and map vs model Fourier shell correlation curves of C) bit truncated and D) JPEG compressed β-gal dataset reconstructions. Inset plots show magnified views of the same curves intersecting with the corresponding FSC cut-off value as indicated by the rectangular annotation.
Figure 4.
Figure 4.
TrpV1 reconstruction FSC curves. A) The effect of various levels of bit truncation on internal Fourier shell correlation. B) The run-to-run variation of TrpV1 even-odd FSC curves when performing a sequence of 3 identical reconstructions of the uncompressed data.

References

    1. Barad BA, Echols N, Wang RY, Cheng Y, DiMaio F, Adams PD and Fraser JS, 2015. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat Methods. 12, 943–946. - PMC - PubMed
    1. Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne JL and Subramaniam S, 2015. 2.2 Å resolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science. 348, 1147–1151. - PMC - PubMed
    1. Bartesaghi A, Matthies D, Banerjee S, Merk A and Subramaniam S, 2014. Structure of β-galactosidase at 3.2-Å resolution obtained by cryo-electron microscopy. Proc Natl Acad Sci U S A. 111, 11709–11714. - PMC - PubMed
    1. Cheng A, Henderson R, Mastronarde D, Ludtke SJ, Schoenmakers RHM, Short J, Marabini R, Dallakyan S, Agard D and Winn M, 2015. MRC2014: Extensions to the MRC format header for electron cryo-microscopy and tomography. J Struct Biol. 192, 146–150. - PMC - PubMed
    1. Cromey DW, 2010. Avoiding twisted pixels: ethical guidelines for the appropriate use and manipulation of scientific digital images. Sci Eng Ethics. 16, 639–667. - PMC - PubMed

Publication types

LinkOut - more resources