Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Jun:43:1-8.
doi: 10.1016/j.mib.2017.10.005. Epub 2017 Oct 31.

Big data in cryoEM: automated collection, processing and accessibility of EM data

Affiliations
Review

Big data in cryoEM: automated collection, processing and accessibility of EM data

Philip R Baldwin et al. Curr Opin Microbiol. 2018 Jun.

Abstract

The scope and complexity of cryogenic electron microscopy (cryoEM) data has greatly increased, and will continue to do so, due to recent and ongoing technical breakthroughs that have led to much improved resolutions for macromolecular structures solved using this method. This big data explosion includes single particle data as well as tomographic tilt series, both generally acquired as direct detector movies of ∼10-100 frames per image or per tilt-series. We provide a brief survey of the developments leading to the current status, and describe existing cryoEM pipelines, with an emphasis on the scope of data acquisition, methods for automation, and use of cloud storage and computing.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

None.

Figures

Figure 1
Figure 1
A typical automated workflow applied to the protein apoferritin, produces a 2.8Å resolution 3D map within a few hours of sample insertion. a) 133 movies (totaling 428 gigabytes) were collected in a single session using the Leginon [45] automated data collection software. b) Movies were corrected for motion using Motioncor2 [15], CTF was estimated using ctffind4 [96], and 27,324 particles were picked from images with suitable drift and CTF parameters (3.8 gigabytes of metadata). C) 2D classification using RELION [33] was used to sort out 15,835 suitable particles (39 gigabytes): the other particles belong to classes that are deemed not to belong to the structure (primarily noise, contamination, or misshapen particles). d) 3D classification and refinement (RELION [33]) results in the final 2.8Å 3D map (72 Gbytes); modeling performed using Chimera [97].
Fig. 2
Fig. 2
Leginon/Appion tomography workflow used for single particles and other more complex samples. a) Leginon [45] view of data collection. During collection at the microscope the software displays, from left to right, the previous tilt image, the current tilt image, and the correlation peak which is needed for predictive tracking. One hour of collection results in about 3 tiltseries (10.5 gigabytes) of unaligned and aligned images b) Processing summary for alignment of the tilt series within the Appion-Protomo tilt-series alignment suite [98]. Columns list the quality of alignment, number of alignments, tilt parameters, and links to summaries and results of tilt-series refinements (approximately 60 gigabytes for 3 tilt series). c) Final 3D reconstructions: left, reconstruction of single particles; right, reconstruction of cell adhesion receptors attached to liposomes (courtesy of Julia Brasch and Lawrence Shapiro) (from 3 to 300 gigabytes, depending on binning). Following the creation of the 3D tomographic volume, the rest of the steps in the pipeline can be variable and depend on the goal of the project. Particles can be excised in 3D for sub-tomogram averaging, or regions of the tomogram may be visually analyzed without averaging.

References

    1. Hilbert M. Big data for development: A review of promises and challenges. Development Policy Review. 2016;34(1):135–174.
    1. Taylor RC. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics. 2010;11(Suppl 12):S1. - PMC - PubMed
    1. Liu Y, et al. MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning. Comput Intell Neurosci. 2015;2015:297672. - PMC - PubMed
    1. Kuhlbrandt W. Biochemistry. The resolution revolution. Science. 2014;343(6178):1443–4. The article that coined the term “Resolution Revolution” in cryoEM, and provides a succinct overview of the events. - PubMed
    1. Ekiert DC, et al. Architectures of Lipid Transport Systems for the Bacterial Outer Membrane. Cell. 2017;169(2):273–285e17. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources