Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;38(4):679-694.
doi: 10.1007/s10334-025-01281-z. Epub 2025 Sep 4.

Gpu-accelerated JEMRIS for extensive MRI simulations

Affiliations

Gpu-accelerated JEMRIS for extensive MRI simulations

Aizada Nurdinova et al. MAGMA. 2025 Aug.

Abstract

Purpose: To enable accelerated Bloch simulations by enhancing the open-source multi-purpose MRI simulation tool JEMRIS with graphic processing units (GPU) parallelization.

Methods: A GPU-compatible version of JEMRIS was built by shifting the computationally expensive parallelizable processes to the GPU to benefit from heterogeneous computing and by adding asynchronous communication and mixed precision support. With key classes reimplemented in CUDA C++, the developed GPU-JEMRIS framework was tested on simulations of common MRI artifacts in numerical phantoms. The accuracy and performance of the GPU-parallelized JEMRIS simulator were benchmarked against the CPU-parallelized JEMRIS and GPU-enabled KomaMRI.jl simulators. Additionally, an example of liver fat quantification errors due to respiratory motion artifacts was simulated in a multi-echo gradient echo (MEGRE) acquisition.

Results: The GPU-accelerated JEMRIS achieved speed-up factors 3-12 and 7-65 using double and single precision numerical integrators, respectively, when compared to the parallelized CPU implementation in the investigated numerical phantom scenarios. While double precision GPU simulations negligibly differ (<0.1% NRMSE) from double precision CPU simulations, the single precision simulations still present small errors of up to 1% k-space signal NRMSE. The developed a GPU extension enabled computationally demanding motion simulations with a multi-species abdominal phantom and a MEGRE sequence, showing significant and spatially varying fat fraction bias in the presence of motion.

Conclusion: By solving the Bloch equations in parallel on device, accelerated Bloch simulations can be performed on any GPU-equipped device with CUDA support using the developed GPU-JEMRIS. This would enable further insights into more realistic large spin pool MR simulations such as experiments with large multi-dimensional phantoms, multiple chemical species and dynamic effects.

Keywords: Bloch simulations; GPU acceleration; JEMRIS; Motion artifacts; Quantitative MRI.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of interest: Dimitrios Karampinos receives research grant support from Philips Healthcare outside the scope of the present work. Ethical standards: Dimitrios Karampinos receives research grant support from Philips Healthcare outside the scope of the present work.

Figures

Fig. 1
Fig. 1
Schematic class diagram of GPU-JEMRIS highlighting main changes for the GPU support in dark blue boxes. The following functionalities are modified: methods allocating GPU memory and copying data of the computational core to device are added to classes World, Sample and Coil. The class Model communicates with the sequence tree on the host and calls the solver on the GPU at the timepoints of interest (TPOI) in RunSequenceTreeGPU(). The numerical solver on GPU is implemented in Bloch_CV_Model: the function blochGPU() computes the ODEs right-hand side for all spins in parallel eliminating the sequential loop in the original CPU version. The Coil class interpolates input sensitivity maps and keeps them on device throughout the simulations. The Bloch equations are then solved for all spins at every TPOI, the Coil method ReceiveGPU performs asynchronous reduction on device. The Sequence class is not modified for the GPU support
Fig. 2
Fig. 2
The train of function calls during GPU-JEMRIS simulations in the main and helper GPU streams, as well as in the CPU thread. Note that after GPU memory allocation, the host and CUDA streams run independently. Steps 1–3 involve data preparation, GPU memory allocation, and data transfer to the device (GPU). Steps 4–6 include iteration through the sequence tree on the host, CVODE model re-initialization, and solving Bloch equations on the device for each time point of interest (TPOI). At step 7, the estimated magnetization is multiplied by coil sensitivities and summed on the GPU. Steps 8–9 involve transferring the results from device to CPU, freeing up resources and completing the simulation process
Fig. 3
Fig. 3
JEMRIS simulations of motion artifacts: input parameters are a numerical phantom with set magnetic properties (T1, T2, T2, PD, Δω, χ), a motion trajectory as a function of time, an acquisition sequence, transmit/receive coil sensitivity maps. The output of the simulator is the bulk magnetization vector at the acquisition timepoints, which can be converted to k-space matrix by taking the transverse magnetization part, provided the phase encoding order in time
Fig. 4
Fig. 4
Geometrical phantom simulations setup: (a) R1=1/T1 and R2=1/T2 maps (units [1/s]) of the constructed numerical phantom containing geometrical shapes enumerated 1–7 with MR properties as depicted in table (b). (c) Magnitude and phase maps of the simulated 4-channel reception coil array. (d) Simulated 2D multi-shot TSE sequence exported from the clinical scanner, sequence parameters are provided in Table 1
Fig. 5
Fig. 5
Liver phantom simulations setup. (a) DUKE abdominal numerical phantom with assigned magnetic properties: the phantom includes one water and nine fat components with chemical shifts and relative proton density (PD) values varying according to the Hamilton fat model [31]. (b) Translational motion as a function of acquired k-space phase encoding line. (c) Sequence diagram of the quantitative six-echo gradient echo sequence exported from the clinical scanner and converted to the .xml format suitable for JEMRIS. Sequence parameters are provided in Table 1
Fig. 6
Fig. 6
Accuracy of GPU simulations relative to CPU computations for bSSFP banding artifacts induced by susceptibility in a brain phantom (reproducing results from [18]). (a) The reconstructed image from the MPI simulations which took 116 s. (b) Image simulated using double precision GPU-JEMRIS, the computation took 71 s. (c) Image difference with NRMSE = 0.36% between the double precision MPI and double precision GPU computations. (d) Image simulated using single precision GPU-JEMRIS, the computation took 18 s. (c) Image difference with NRMSE = 0.6% between the double precision MPI and single precision GPU computations. Both figures (c) and (e) indicate noise-like differences between images from CPU-based and GPU-based simulations
Fig. 7
Fig. 7
Double and single precision GPU simulations compared to double precision MPI for a geometrical phantom with 2,250,000 spins and a 2D TSE sequence. (a) Received signals for the first two spin echoes in coil channel 1. GPU and CPU results closely match with the total signal NRMSE of 0.11% (double precision) and 0.10% (single precision). IFFT-reconstructed images from receive coil channel 1: (b) double precision MPI, (c) double precision GPU, and (e) single precision GPU. Difference images: (d) double precision GPU vs. MPI (NRMSE =7e-5%), (f) single precision GPU vs. MPI (NRMSE =0.8%). Computation times reduced from 8.5 h (double precision MPI) to 3.5 h (double precision GPU) and 1.25 h (single precision GPU)
Fig. 8
Fig. 8
Acceleration and accuracy analysis of GPU simulations with increasing number of spins in a geometrical phantom. (a) Simulation times as a function of the total number of spins (in a logarithmic scale) for double precision MPI (MPI_DP), double and single precision GPU (GPU_DP and GPU_SP) simulations. Speed-up factors of 3–12 were achieved for double precision and 7–65 for single precision GPU simulations vs. double precision MPI simulations. (b) K-space signal normalized root-mean-square error (NRMSE) for double and single precision GPU simulations compared to the MPI reference. The NRMSE was below 0.08% for double precision and between 0.2–0.8% for single precision GPU simulations
Fig. 9
Fig. 9
Multi-echo Dixon sequence simulations of a multi-species liver phantom in the presence of motion. (a) Motion trajectory plotted against the acquired phase encoding line. Translations of amplitude 3 pixels in the anterior-posterior direction were applied in the second half of data acquisition. (b) Magnitude images from the static phantom simulation present homogeneous and well-defined structures, while (c) images from the moving phantom simulations contain blurring and ghosting artifacts with similar patterns but varying severity across the echoes
Fig. 10
Fig. 10
Fat fraction quantification errors induced by motion. Separated water and fat proton density, as well as PDFF maps for simulations with (a) static phantom and (b) moving phantom. (c) Histogram of estimated PDFF values in the liver region for static and moving phantom simulations. The ground truth liver PDFF was 10%. For the static phantom, the distribution of PDFF is narrow with mean value at 8% and standard deviation 2%. In the presence of motion, the estimated PDFF values are distributed with mean 14% and standard deviation 3%

References

    1. Bloch F (1946) Nuclear induction. Phys Rev 70:460–474. 10.1103/PhysRev.70.460 - DOI
    1. Scholand N, Wang X, Roeloffs V, Rosenzweig S, Uecker M (2023) Quantitative MRI by nonlinear inversion of the bloch equations. Magn Reson Med 90(2):520–538. 10.1002/mrm.29664 - DOI - PubMed
    1. Lauzon M (2020) A beginner’s guide to Bloch equation simulations of magnetic resonance imaging sequences. arXiv:2009.02789
    1. Lee PK, Watkins LE, Anderson TI, Buonincontri G, Hargreaves BA (2019) Flexible and efficient optimization of quantitative sequences using automatic differentiation of Bloch simulations. Magn Reson Med 82:1438–1451. 10.1002/mrm.27832 - DOI - PMC - PubMed
    1. Luo T, Noll DC, Fessler JA, Nielsen JF (2021) Joint design of RF and gradient waveforms via auto-differentiation for 3d tailored excitation in MRI. IEEE Trans Med Imaging 40:3305–3314. 10.1109/TMI.2021.3083104 - DOI - PMC - PubMed

LinkOut - more resources