Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 21:4:100032.
doi: 10.1016/j.yjsbx.2020.100032. eCollection 2020.

Validation tests for cryo-EM maps using an independent particle set

Affiliations

Validation tests for cryo-EM maps using an independent particle set

Sebastian Ortiz et al. J Struct Biol X. .

Abstract

Cryo-electron microscopy (cryo-EM) has revolutionized structural biology by providing 3D density maps of biomolecules at near-atomic resolution. However, map validation is still an open issue. Despite several efforts from the community, it is possible to overfit 3D maps to noisy data. Here, we develop a novel methodology that uses a small independent particle set (not used during the 3D refinement) to validate the maps. The main idea is to monitor how the map probability evolves over the control set during the 3D refinement. The method is complementary to the gold-standard procedure, which generates two reconstructions at each iteration. We low-pass filter the two reconstructions for different frequency cutoffs, and we calculate the probability of each filtered map given the control set. For high-quality maps, the probability should increase as a function of the frequency cutoff and the refinement iteration. We also compute the similarity between the densities of probability distributions of the two reconstructions. As higher frequencies are included, the distributions become more dissimilar. We optimized the BioEM package to perform these calculations, and tested it over systems ranging from quality data to pure noise. Our results show that with our methodology, it possible to discriminate datasets that are constructed from noise particles. We conclude that validation against a control particle set provides a powerful tool to assess the quality of cryo-EM maps.

Keywords: 3D refinement; BioEM; Cryo-EM; Independent; Raw data; Reconstruction; Validation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Unbiased validation protocol for cryo-EM maps using an independent particle set. (left) Gold-standard refinement procedure in cryo-EM. Two particle sets are used to generate two independent reconstructions. These reconstructions are compared using the Fourier shell correlation (FSC). A fixed FSC threshold is used to extract the resolution of the reconstructions. The process is iterated until the resolution stops improving. (right) Novel validation protocol using a small independent particle set not included in the gold-standard refinement. At each iteration of the refinement, the reconstructions are low-pass filtered to different frequency cutoffs kc. The BioEM probabilities (Cossio and Hummer, 2013, Cossio et al., 2017), over the independent control set, are calculated as a function of kc. Two tests validate the quality of the reconstructions: 1) the map evidence of the log-posterior and 2) the statistical similarity between the probability distributions (measured with a normalized Jensen-Shannon divergence). The results from both tests should increase as a function of the frequency cutoff. The maps represented correspond to the RAG1-RAG2 comple.x (see Methods).
Fig. 2
Fig. 2
The sum of the log-posterior relative to noise ωln(Piω)/N-ln(PNoise), over the control set with N particles, as a function of the frequency cutoff for reconstructions from set i=1 and 2 (solid and dashed lines, respectively). The results are shown for different refinement iteration steps with a gradient color code: the first iteration is maroon and the last iteration is green. On the top row, we show the results for the cryo-EM systems where we expect refinement to work properly: HCN1, TRPV1 and RAG1-RAG2 for N=5000. Systems that present signs of treating noise as signal, the HIV-ET with N=5000 and a noise particle control set with N=1000, are shown in the bottom row, highlighted with a red box.
Fig. 3
Fig. 3
Normalized Jensen-Shannon divergence (NJSD) as a function of the frequency cutoff. This metric calculates the similarity between the distributions of the BioEM probabilities computed for the two reconstructions from sets 1 and 2. We use a gradient color code for the refinement iteration steps: the first iteration is maroon and the last iteration is green. On the top row, we show the results for the systems where standard cryo-EM refinement is expected to work: HCN1, TRPV1 and RAG1-RAG2. For these systems, we fit the data points to an inverse exponential function -Ae-kc/γ+B (solid lines). Systems that treat noise as signal due to the alignment, a noise particle control set and HIV-ET, are shown in the bottom row with the dashed lines as a guide, and highlighted by a red box. The number of particles in the control sets are the same as for the data in Fig. 2. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 4
Fig. 4
Frequency γ versus the inverse of the resolution for the standard cryo-EM systems: HCN1, TRPV1 and RAG1-RAG2. The NJSD curves for these systems were fitted to an inverse exponential function -Ae-kc/γ+B where γ is the frequency. We find large correlations between γ and the inverse of the resolution (calculated using the 0.143 criteria). The correlation coefficients are r2=0.93,0.91, and 0.85, for HCN1, TRPV1 and RAG1-RAG2, respectively. Solid lines show the linear fits to the individual sets. Black dashed line shows the global fit with parameters γ=0.42/R-0.02 where R is the resolution. The correlation coefficient for the global fit is r2=0.86.

References

    1. Afanasyev P., Seer-Linnemayr C., Ravelli R.B.G., Matadeen R., De Carlo S., Alewijnse B., Portugal R.V., Pannu N.S., Schatz M., van Heel M. Single-particle cryo-EM using alignment by classification (ABC): the structure of Lumbricus terrestris haemoglobin. IUCrJ. 2017;4:678–694. - PMC - PubMed
    1. Afonine, P.V., Klaholz, B.P., Moriarty, N.W., Poon, B.K., Sobolev, O.V., Terwilliger, T.C., Adams, P.D., Urzhumtsev, A., IUCr, 2018. New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallographica Section D Structural Biology 74, 814–840. - PMC - PubMed
    1. Avramov T.K., Vyenielo D., Gomez-Blanco J., Adinarayanan S., Vargas J., Si D. Deep learning for validating and estimating resolution of cryo-electron microscopy density maps. Molecules. 2019;24 doi: 10.3390/molecules24061181. - DOI - PMC - PubMed
    1. Berman H., Westbrook J., Feng Z., Gilliland G., Bhat T., Weissig H., Shindyalov I., Bourne P. The protein data bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Brown, A., Long, F., Nicholls, R.A., Toots, J., Emsley, P., Murshudov, G., IUCr, 2015. Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions. Acta Crystallographica Section D Biological Crystallography 71, 136–153. - PMC - PubMed

LinkOut - more resources