Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017;3(1):15.
doi: 10.1186/s40679-017-0048-z. Epub 2017 Oct 25.

A streaming multi-GPU implementation of image simulation algorithms for scanning transmission electron microscopy

Affiliations

A streaming multi-GPU implementation of image simulation algorithms for scanning transmission electron microscopy

Alan Pryor Jr et al. Adv Struct Chem Imaging. 2017.

Abstract

Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. Here, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditional multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic.

Keywords: Atomic electron tomography; CUDA; Electron scattering; GPU; High performance computing; Imaging simulation; Multislice; PRISM; Scanning transmission electron microscopy.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Flow chart of STEM simulation algorithm steps. a All atoms are separated into slices at different positions along the beam direction, and b atomic scattering factors are used to compute projected potential of each slice. c Multislice algorithm, where each converged probe is initialized, d propagated through each of the sample slices defined in (b), and then e output either as images, or radially integrated detectors. f PRISM algorithm where g converged probes are defined in coordinate system downsampled by factor f as a set of plane waves. h Each required plane wave is propagated through the sample slices defined in (b). i Output probes are computed by cropping subset of plane waves multiplied by probe complex coefficients, and j summed to form output probe, k which is then saved
Fig. 2
Fig. 2
Visualization of the computation model used repeatedly in the Prismatic software package, whereby a pool of GPU and CPU workers are assigned batches of work by querying a synchronized work dispatcher. Once the assignment is complete, the worker requests more work until no more exists. All workers record completed simulation outputs in parallel
Fig. 3
Fig. 3
a Sample profile of the GPU activities on a single NVIDIA GTX 1070 during a multislice simulation in streaming mode with b enlarged inset containing a window where computation is occurring on streams #1 and #5 while three separate arrays are simultaneously being copied on streams #2–4
Fig. 4
Fig. 4
Comparison of the CPU/GPU implementations of the PRISM and multislice algorithms described in this work. A 100 × 100 × 100 Å amorphous carbon cell was divided slices of varying thickness and sampled with progressively smaller pixels in real space corresponding to digitized probes of array size 256 × 256, 512 × 512, 1024 × 1024, and 2048 × 2048, respectively. Two different PRISM simulations are shown, a more accurate case where the interpolation factor f=4 (left), and a faster case with f=16 (right). The multislice simulation is the same for both columns. Power laws were fit of the form A+Bqmaxn where possible. The asymptotic power laws for higher scattering angles are shown on the right of each curve
Fig. 5
Fig. 5
Comparison of the implementations of multislice and PRISM for varying combinations of CPU threads and GPUs. The simulation was performed on a 100 × 100 × 100 Å amorphous carbon cell with 5 Å thick slices, 0.1 Å pixel size, and 20 mrad probe convergence semi-angle. All simulations were performed on compute nodes with dual Intel Xeon E5-2650 processors, four Tesla K20 GPUs, and 64 GB RAM. Calculation time of rightmost data point is labeled for all curves
Fig. 6
Fig. 6
Comparison of a relative performance and b peak memory consumption for single-transfer and streaming implementations of PRISM and multislice
Fig. 7
Fig. 7
Comparison of simulation results produced by a computemb MULTEM, and cg Prismatic. The sample is composed of 27 × 27 × 27 pseudocubic perovskite unit cells, and images were simulated using 80 keV electrons, a 20 mrad probe convergence semi-angle, 0 Å defocus, and 1520 × 1536 pixel sampling for the probe and projected potential. A total of 512 × 512 probe positions were computed and the final images are an average over 64 frozen phonon configurations. Separate PRISM simulations were performed with interpolation factors 4, 8, 12, and 16. Line scans corresponding to the positions of the red/blue arrows are shown in the right-hand column. As the various simulations produce results with differing absolute intensity scales, all images were scaled to have the same mean intensity as Prismatic multislice
Fig. 8
Fig. 8
Images from one projection of an atomic electron tomography tilt series of a FePt nanoparticle [14], from a experiment, b linear projection of the reconstruction, c multislice simulation, and df PRISM simulations for f=8, 16, and 32, respectively. g Relative root-mean-square error of the images in (bf) relative to (a). h Calculation times per frozen phonon configuration for (cf). All simulations performed with Prismatic

References

    1. Crewe AV. Scanning transmission electron microscopy. J. Microsc. 1974;100(3):247–259. doi: 10.1111/j.1365-2818.1974.tb03937.x. - DOI - PubMed
    1. Nellist PD. Scanning transmission electron microscopy. New York: Springer; 2007. pp. 65–132.
    1. Batson P, Dellby N, Krivanek O. Sub-ångstrom resolution using aberration corrected electron optics. Nature. 2002;418(6898):617–620. doi: 10.1038/nature00972. - DOI - PubMed
    1. Muller DA. Structure and bonding at the atomic scale by scanning transmission electron microscopy. Nat. Mater. 2009;8(4):263–270. doi: 10.1038/nmat2380. - DOI - PubMed
    1. Pennycook, S.J.: The impact of stem aberration correction on materials science. Ultramicroscopy. 180, 22–33 (2017). doi:10.1016/j.ultramic.2017.03.020 - PubMed

LinkOut - more resources