Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 16:7:50.
doi: 10.3389/fninf.2013.00050. eCollection 2013.

Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease

Affiliations

Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease

Denis P Shamonin et al. Front Neuroinform. .

Abstract

Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4-5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15-60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license.

Keywords: Alzheimer's disease; OpenCL; acceleration; elastix; image registration; parallelization.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Design of the resample filter on the GPU. We select a chunk of the output image, initialize it (red kernel), and for that chunk a series of transformations Tn(… T2(T1(x))) are computed and stored in the intermediate transformation field (green kernels). After these transformation kernels have finished, the input image is interpolated and the result is stored in the output image chunk (blue kernel). Then we proceed to the next chunk. The loops in purple are computed in parallel.
Figure 2
Figure 2
Image spaces defined within the ADNI structural MRI data: image space (ΩI) and the template space (ΩTemplate). Another image space (ΩAtlas) is defined for the 30 atlas images. Transformations between the image spaces are indicated by S, U, V, and W. The arrows are pointing from the fixed to the moving domain. Different subjects are represented by i and j, the different atlas images are represented by k. From all Ii, a template space image (I) is calculated (Section 3.3.3).
Figure 3
Figure 3
The region labeling consisting of 72 ROIs in the brain.
Figure 4
Figure 4
Registration performance as a function of the number of threads. Ri denotes the resolution number, b refers to the baseline un-accelerated algorithm, and the numbers 1–16 refer to the number of threads used when running the parallel accelerated algorithm. The blue line shows ideal linear speedup. Results are shown for MI, N = 5·104, |Ω˜F| = 2000. (A) Shows the runtime per iteration, (B) the speedup factor , and (C) the efficiency .
Figure 5
Figure 5
Registration performance as a function of the number of threads. Ri denotes the resolution number, b refers to the baseline un-accelerated algorithm, and the numbers 1–16 refer to the number of threads used when running the parallel accelerated algorithm. The blue line shows ideal linear speedup. Varying the number of samples (A), the number of registration parameters (B), and the cost function (C).
Figure 6
Figure 6
Speedup factors for the GPU resampling framework. Results are shown for the nearest neighbor (A), the linear (B), and the 3rd order B-spline interpolator (C).
Figure 7
Figure 7
Resample example for the highest nRMSE of Table 3 (NN, A, 1003). Differences are due to 79 isolated voxels in the range [−743, 502]. Shown are the result for the CPU (A), the GPU (B), and their difference (C).
Figure 8
Figure 8
Registration result for the median case of the voxel-wise method with a RMSE of 0.419 mm. The fixed T1w image, the transformed moving T1w image registered with the original and the accelerated version of elastix and the difference between the two resulting images are shown.
Figure 9
Figure 9
Bland-Altman plot of the region-wise features for the original and accelerated versions of elastix. The features represent the GM volume per brain ROI divided by the intracranial volume. The average features were grouped in bins of width 0.001, for each bin a boxplot is shown. 72 features for 299 subjects are included. The mean difference between the features is 1.0 · 10−7 (CI: −5.2· 10−5 ; 5.2· 10−7), mean and CI are indicated with the striped and dotted lined in the figure.
Figure 10
Figure 10
Template space for the voxel-wise features constructed with the original version of elastix (top row) and the accelerated version (middle row). The difference between the two is shown at the bottom row.
Figure 11
Figure 11
Receiver-operator characteristic (ROC) curves for the classification based on voxel-wise (red, blue) and region-wise features (magenta, green) calculated with the original and accelerated versions of elastix. Between brackets, the area under the curve (AUC) is given as performance measure.

References

    1. Alexander D., Pierpaoli C., Basser P., Gee J. (2001). Spatial transformation of diffusion tensor magnetic resonance images. IEEE Trans. Med. Imag. 20, 1131–1139 10.1109/42.963816 - DOI - PubMed
    1. Alzheimer's Association. (2012). 2012 Alzheimer's disease facts and figures. Alzheimers Dement. 8, 113–168 10.1016/j.jalz.2012.02.001 - DOI - PubMed
    1. Ashburner J. (2007). A fast diffeomorphic image registration algorithm. Neuroimage 38, 95–113 10.1016/j.neuroimage.2007.07.007 - DOI - PubMed
    1. Ashburner J., Friston K. (2000). Voxel-based morphometry–the methods. Neuroimage 11, 805–821 10.1006/nimg.2000.0582 - DOI - PubMed
    1. Ashburner J., Friston K. (2005). Unified segmentation. Neuroimage 26, 839–851 10.1016/j.neuroimage.2005.02.018 - DOI - PubMed