Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec:135:24-35.
doi: 10.1016/j.ultramic.2013.06.004. Epub 2013 Jun 21.

High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy

Affiliations

High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy

Shaoxia Chen et al. Ultramicroscopy. 2013 Dec.

Abstract

Three-dimensional (3D) structure determination by single particle electron cryomicroscopy (cryoEM) involves the calculation of an initial 3D model, followed by extensive iterative improvement of the orientation determination of the individual particle images and the resulting 3D map. Because there is much more noise than signal at high resolution in the images, this creates the possibility of noise reinforcement in the 3D map, which can give a false impression of the resolution attained. The balance between signal and noise in the final map at its limiting resolution depends on the image processing procedure and is not easily predicted. There is a growing awareness in the cryoEM community of how to avoid such over-fitting and over-estimation of resolution. Equally, there has been a reluctance to use the two principal methods of avoidance because they give lower resolution estimates, which some people believe are too pessimistic. Here we describe a simple test that is compatible with any image processing protocol. The test allows measurement of the amount of signal and the amount of noise from overfitting that is present in the final 3D map. We have applied the method to two different sets of cryoEM images of the enzyme beta-galactosidase using several image processing packages. Our procedure involves substituting the Fourier components of the initial particle image stack beyond a chosen resolution by either the Fourier components from an adjacent area of background, or by simple randomisation of the phases of the particle structure factors. This substituted noise thus has the same spectral power distribution as the original data. Comparison of the Fourier Shell Correlation (FSC) plots from the 3D map obtained using the experimental data with that from the same data with high-resolution noise (HR-noise) substituted allows an unambiguous measurement of the amount of overfitting and an accompanying resolution assessment. A simple formula can be used to calculate an unbiased FSC from the two curves, even when a substantial amount of overfitting is present. The approach is software independent. The user is therefore completely free to use any established method or novel combination of methods, provided the HR-noise test is carried out in parallel. Applying this procedure to cryoEM images of beta-galactosidase shows how overfitting varies greatly depending on the procedure, but in the best case shows no overfitting and a resolution of ~6 Å. (382 words).

Keywords: Beta-galactosidase; Electron cryomicroscopy; Overfitting; Resolution; Single particle; Validation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
(a) Part of micrograph 01.49.47 recorded at 300 keV on a Falcon II detector showing a field of view of beta-galactosidase particles embedded in ice, with 2.7 μm defocus, (b) average radial power spectra (intensities) of 3200 tightly masked (170×170 Å2 box) particle images from 5 micrographs similar to that shown in (a) compared with the same number of background regions. The lower line shows the power in the particles after subtraction of background. Signal is about equal to background at 30 Å resolution, about 40x less than background at 10 Å and 100x less than background at 5 Å resolution. (c) 10 individual particles selected from the micrograph, (d) same particles after HR-noise substitution beyond 10 Å from the adjacent empty areas shown in bottom row, (e) with random phases beyond 10 Å, (f) with HR-noise beyond 17 Å, (g) with random phases beyond 17 Å, and (h) adjacent noise areas used for HR-noise substitution. Since the signal is less or much less than the noise at 17 or 10 Å resolution, the particle images look very similar by eye. Scale bars 200 Å.
Fig. 2
Fig. 2
(a) and (b) Results of Frealign processing of 6733 single particle images of beta-galactosidase recorded at 80 keV on film. The FSC between half data sets (red symbols) is compared with that obtained from the same data set with HR-noise substituted beyond 17 Å (blue and green symbols). Overfitting is shaded blue, with the difference between the two curves, representing real features of the structure, shaded pink. (a) FSCs between half data sets with position and orientation refinement to 7 Å resolution. (b) For the same 6733 particles, FSCs between half data sets with position and orientation refinement to 17 Å resolution. No overfitting is present beyond 17 Å in (b) because that information was not used in refinement. Regardless of the cut-off resolution used in refinement, the plots show the images contain structural information to about 13 Å resolution. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3
Fig. 3
Results of processing of 43758 single particle images of beta-galactosidase recorded at 300 keV on the FEI Falcon II detector, using four different procedures. The FSC from the particle data set (red symbols) is compared in each case with that obtained from the same data set with HR-noise substituted beyond 10 Å (blue symbols) or in one case beyond 8.5 Å. Overfitting is shaded blue, with the difference between the two curves, representing correlations between real features of the structure, shaded pink. (a) Results of Frealign processing. To produce more noticeable overfitting, a value of −200 Å2 for the parameter RB-factor was used, noting that this is explicitly not recommended for normal practice . Data out to 7 Å resolution was used in orientation determination, so the overfitting is only evident between 10 and 7 Å resolution. Calculation of FSCtrue from FSCt and FSCn demonstrates a resolution around 6.9 Å. In this case, the very small degree of overfitting does not affect the estimated resolution. (b) The Relion package has been used with complete separation of the data into two halves and gold-standard FSC weighting to carry out low-pass filtering of the reference at each cycle as described . With the gold-standard procedure and gold-standard FSC weighting, there is no overfitting, confirmed by values of FSCn that are zero beyond 10 Å. The map shows 6.4 Å resolution. (c) The Xmipp package has been used with a single reference and “sub-optimal FSC” weighting to apply low-pass filtering at each cycle. With “sub-optimal” FSC weighting however, some overfitting and exaggerated resolution is seen. (d) Results of processing the same data using a new program still under development (McMullan, unpublished) and configured to show substantial overfitting when refined out to 5 Å. FSCtrue (green symbols) shows the true resolution of the structure after removing the effect of overfitting on FSCt. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4
Fig. 4
Effect of increasingly tight masking on FSCt and FSCn, showing the usefulness of Eq. (4) in deconvoluting the effect of the mask to reveal the true FSC for the experimental density inside the mask (FSCtrue). (a) FSCt curves between 3D maps from two halves of a data set. (b) FSCn between 3D maps from the two halves of the HR-noise data set. (c) FSCtrue curves calculated from the experimental plots shown in (a) and (b) using Eq. (4). The four curves were obtained by application of different masks to the same density map. The densities and shapes of features in the map are unaffected by masking. The red symbols are for unmasked maps. The blue symbols are for maps with a soft spherical mask slightly bigger than the molecule. The green symbols are for maps with a soft mask that follows the molecular shape. The mask fall-off profile for both soft masks was a cosine half-bell of width 6 pixels. The orange symbols are for maps with a steeper mask profile that follows the molecular shape, with a cosine half-bell of width 3 pixels. The FSC increases as background regions are excluded, with the resolution judged at 0.143 FSC increasing from 6.5 Å to 5.6 Å with an optimal mask. The tightest mask had a relatively steep profile that introduced false features in the FSCt and FSCn plots, but these are effectively removed by calculating FSCtrue, so that the orange and green FSCtrue curves in (c) are very similar. An even steeper (e.g. binary) mask produces much larger artefacts in FSC and the deconvolution is then inaccurate. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 5
Fig. 5
Summary of four recommended validation tools. Each panel is from a map chosen to show the value of each tool most clearly. (a) and (b) show comparisons of experimentally determined maps of beta-galactosidase with an atomic model, whereas (c) and (d) show tools that can be used with maps where atomic coordinates are not available. If atomic coordinates are available, then the FSC between map and model provides validation of all steps in the process, although great care is still needed with any flexible fitting procedure. In many cases, however, such as for novel structures at low resolution, atomic coordinates are not available. (a) 3D map of beta-galactosidase from Fig.3(b) obtained using Relion with rigid-body-fitted atomic model superimposed. The arrow shows a beta hairpin. (b) FSC between the Relion map and atomic model. A resolution of 7.6 Å is estimated at 0.5 FSC for a rigid body fit, and 7.2 Å using a jelly body flexible fit to data truncated at 7.5 Å. The value of FSC 0.5 is used in this comparison because the map is calculated from all the images rather only half and the atomic model is assumed to be perfect. (c) Tilt pair parameter plot, which is important for validation at lower (below 1/15 Å−1) resolutions . (d) A typical comparison of the FSC curve (from Fig.3(c)) for a structure with that obtained after HR-noise substitution. In this case, the correlation between the two half data sets shows that about one third of it consists of overfitted noise at 9 Å and half at 7 Å. A genuine resolution of 7.0 Å is estimated from FSCtrue (green symbols), rather than the 6.3 Å value that would be falsely suggested by the overfitted noise (red symbols). The value of FSC 0.143 is used in this comparison because both maps are calculated from only half the images so both contain more noise than the map used for (b). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Similar articles

Cited by

References

    1. Dubochet J., Adrian M., Chang J.J., Homo J.C., Lepault J., McDowall A.W., Schultz P. Cryo-electron microscopy of vitrified specimens. Quarterly Reviews of Biophysics. 1988;21:129–228. - PubMed
    1. Taylor K.A., Glaeser R.M. Electron diffraction of frozen, hydrated protein crystals. Science. 1974;186:1036–1037. - PubMed
    1. Fujiyoshi Y., Mizusaki T., Morikawa K., Yamagishi H., Aoki Y., Kihara H., Harada Y. Development of a superfluid-helium stage for high-resolution electron-microscopy. Ultramicroscopy. 1991;38:241–251.
    1. Zemlin F., Beckmann E., vanderMast K.D. A 200 kV electron microscope with Schottky field emitter and a helium-cooled superconducting objective lens. Ultramicroscopy. 1996;63:227–238.
    1. Homo J.C., Booy F., Labouesse P., Lepault J., Dubochet J. Improved anti-contaminator for cryo-electron microscopy with a Philips EM-400. Journal of Microscopy-Oxford. 1984;136:337–340.

MeSH terms

Substances