. 2025 Aug;38(4):679-694.

doi: 10.1007/s10334-025-01281-z. Epub 2025 Sep 4.

Gpu-accelerated JEMRIS for extensive MRI simulations

Aizada Nurdinova^{1

2}, Stefan Ruschke³, Michael Gestrich⁴, Jonathan Stelter³, Dimitrios C Karampinos³

Affiliations

¹ Department of Radiology, Stanford University, Stanford, USA. nurdaiza@stanford.edu.
² Institute of Diagnostic and Interventional Radiology, School of Medicine and Health, Technical University of Munich, Munich, Germany. nurdaiza@stanford.edu.
³ Institute of Diagnostic and Interventional Radiology, School of Medicine and Health, Technical University of Munich, Munich, Germany.
⁴ Altair Engineering GmbH, Boeblingen, Germany.

PMID: 40906323
PMCID: PMC12443918
DOI: 10.1007/s10334-025-01281-z

Gpu-accelerated JEMRIS for extensive MRI simulations

Aizada Nurdinova et al. MAGMA. 2025 Aug.

. 2025 Aug;38(4):679-694.

doi: 10.1007/s10334-025-01281-z. Epub 2025 Sep 4.

Authors

Aizada Nurdinova^{1

2}, Stefan Ruschke³, Michael Gestrich⁴, Jonathan Stelter³, Dimitrios C Karampinos³

Affiliations

¹ Department of Radiology, Stanford University, Stanford, USA. nurdaiza@stanford.edu.
² Institute of Diagnostic and Interventional Radiology, School of Medicine and Health, Technical University of Munich, Munich, Germany. nurdaiza@stanford.edu.
³ Institute of Diagnostic and Interventional Radiology, School of Medicine and Health, Technical University of Munich, Munich, Germany.
⁴ Altair Engineering GmbH, Boeblingen, Germany.

PMID: 40906323
PMCID: PMC12443918
DOI: 10.1007/s10334-025-01281-z

Abstract

Purpose: To enable accelerated Bloch simulations by enhancing the open-source multi-purpose MRI simulation tool JEMRIS with graphic processing units (GPU) parallelization.

Methods: A GPU-compatible version of JEMRIS was built by shifting the computationally expensive parallelizable processes to the GPU to benefit from heterogeneous computing and by adding asynchronous communication and mixed precision support. With key classes reimplemented in CUDA C++, the developed GPU-JEMRIS framework was tested on simulations of common MRI artifacts in numerical phantoms. The accuracy and performance of the GPU-parallelized JEMRIS simulator were benchmarked against the CPU-parallelized JEMRIS and GPU-enabled KomaMRI.jl simulators. Additionally, an example of liver fat quantification errors due to respiratory motion artifacts was simulated in a multi-echo gradient echo (MEGRE) acquisition.

Results: The GPU-accelerated JEMRIS achieved speed-up factors 3-12 and 7-65 using double and single precision numerical integrators, respectively, when compared to the parallelized CPU implementation in the investigated numerical phantom scenarios. While double precision GPU simulations negligibly differ (<0.1% NRMSE) from double precision CPU simulations, the single precision simulations still present small errors of up to 1% k-space signal NRMSE. The developed a GPU extension enabled computationally demanding motion simulations with a multi-species abdominal phantom and a MEGRE sequence, showing significant and spatially varying fat fraction bias in the presence of motion.

Conclusion: By solving the Bloch equations in parallel on device, accelerated Bloch simulations can be performed on any GPU-equipped device with CUDA support using the developed GPU-JEMRIS. This would enable further insights into more realistic large spin pool MR simulations such as experiments with large multi-dimensional phantoms, multiple chemical species and dynamic effects.

Keywords: Bloch simulations; GPU acceleration; JEMRIS; Motion artifacts; Quantitative MRI.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of interest: Dimitrios Karampinos receives research grant support from Philips Healthcare outside the scope of the present work. Ethical standards: Dimitrios Karampinos receives research grant support from Philips Healthcare outside the scope of the present work.

Figures

**Fig. 1**
Schematic class diagram of GPU-JEMRIS highlighting main changes for the GPU support in dark blue boxes. The following functionalities are modified: methods allocating GPU memory and copying data of the computational core to device are added to classes *World*, *Sample* and *Coil*. The class *Model* communicates with the sequence tree on the host and calls the solver on the GPU at the timepoints of interest (TPOI) in *RunSequenceTreeGPU()*. The numerical solver on GPU is implemented in *Bloch_CV_Model*: the function *blochGPU()* computes the ODEs right-hand side for all spins in parallel eliminating the sequential loop in the original CPU version. The *Coil* class interpolates input sensitivity maps and keeps them on device throughout the simulations. The Bloch equations are then solved for all spins at every TPOI, the *Coil* method *ReceiveGPU* performs asynchronous reduction on device. The *Sequence* class is not modified for the GPU support

**Fig. 2**
The train of function calls during GPU-JEMRIS simulations in the main and helper GPU streams, as well as in the CPU thread. Note that after GPU memory allocation, the host and CUDA streams run independently. Steps 1–3 involve data preparation, GPU memory allocation, and data transfer to the device (GPU). Steps 4–6 include iteration through the sequence tree on the host, CVODE model re-initialization, and solving Bloch equations on the device for each time point of interest (TPOI). At step 7, the estimated magnetization is multiplied by coil sensitivities and summed on the GPU. Steps 8–9 involve transferring the results from device to CPU, freeing up resources and completing the simulation process

**Fig. 3**
JEMRIS simulations of motion artifacts: input parameters are a numerical phantom with set magnetic properties ( $T_{1}$ , $T_{2}$ , $T_{2}^{*}$ , PD, $Δ ω$ , $χ$ ), a motion trajectory as a function of time, an acquisition sequence, transmit/receive coil sensitivity maps. The output of the simulator is the bulk magnetization vector at the acquisition timepoints, which can be converted to k-space matrix by taking the transverse magnetization part, provided the phase encoding order in time

**Fig. 4**
Geometrical phantom simulations setup: (a) $R_{1} = 1 / T_{1}$ and $R_{2} = 1 / T_{2}$ maps (units [1/s]) of the constructed numerical phantom containing geometrical shapes enumerated 1–7 with MR properties as depicted in table (b). (c) Magnitude and phase maps of the simulated 4-channel reception coil array. (d) Simulated 2D multi-shot TSE sequence exported from the clinical scanner, sequence parameters are provided in Table 1

**Fig. 5**
Liver phantom simulations setup. (a) DUKE abdominal numerical phantom with assigned magnetic properties: the phantom includes one water and nine fat components with chemical shifts and relative proton density (PD) values varying according to the Hamilton fat model [31]. (b) Translational motion as a function of acquired k-space phase encoding line. (c) Sequence diagram of the quantitative six-echo gradient echo sequence exported from the clinical scanner and converted to the .xml format suitable for JEMRIS. Sequence parameters are provided in Table 1

**Fig. 6**
Accuracy of GPU simulations relative to CPU computations for bSSFP banding artifacts induced by susceptibility in a brain phantom (reproducing results from [18]). (a) The reconstructed image from the MPI simulations which took 116 s. (b) Image simulated using double precision GPU-JEMRIS, the computation took 71 s. (c) Image difference with NRMSE = 0.36% between the double precision MPI and double precision GPU computations. (d) Image simulated using single precision GPU-JEMRIS, the computation took 18 s. (c) Image difference with NRMSE = 0.6% between the double precision MPI and single precision GPU computations. Both figures (c) and (e) indicate noise-like differences between images from CPU-based and GPU-based simulations

**Fig. 7**
Double and single precision GPU simulations compared to double precision MPI for a geometrical phantom with 2,250,000 spins and a 2D TSE sequence. (a) Received signals for the first two spin echoes in coil channel 1. GPU and CPU results closely match with the total signal NRMSE of $0.11 %$ (double precision) and 0.10% (single precision). IFFT-reconstructed images from receive coil channel 1: (b) double precision MPI, (c) double precision GPU, and (e) single precision GPU. Difference images: (d) double precision GPU vs. MPI (NRMSE $= 7 e^{- 5} %$ ), (f) single precision GPU vs. MPI (NRMSE $= 0.8 %$ ). Computation times reduced from 8.5 h (double precision MPI) to 3.5 h (double precision GPU) and 1.25 h (single precision GPU)

**Fig. 8**
Acceleration and accuracy analysis of GPU simulations with increasing number of spins in a geometrical phantom. (a) Simulation times as a function of the total number of spins (in a logarithmic scale) for double precision MPI ( $M P I_D P$ ), double and single precision GPU ( $G P U_D P$ and $G P U_S P$ ) simulations. Speed-up factors of 3–12 were achieved for double precision and 7–65 for single precision GPU simulations vs. double precision MPI simulations. (b) K-space signal normalized root-mean-square error (NRMSE) for double and single precision GPU simulations compared to the MPI reference. The NRMSE was below $0.08 %$ for double precision and between 0.2–0.8% for single precision GPU simulations

**Fig. 9**
Multi-echo Dixon sequence simulations of a multi-species liver phantom in the presence of motion. (a) Motion trajectory plotted against the acquired phase encoding line. Translations of amplitude 3 pixels in the anterior-posterior direction were applied in the second half of data acquisition. (b) Magnitude images from the static phantom simulation present homogeneous and well-defined structures, while (c) images from the moving phantom simulations contain blurring and ghosting artifacts with similar patterns but varying severity across the echoes

**Fig. 10**
Fat fraction quantification errors induced by motion. Separated water and fat proton density, as well as PDFF maps for simulations with (a) static phantom and (b) moving phantom. (c) Histogram of estimated PDFF values in the liver region for static and moving phantom simulations. The ground truth liver PDFF was 10%. For the static phantom, the distribution of PDFF is narrow with mean value at $8 %$ and standard deviation $2 %$ . In the presence of motion, the estimated PDFF values are distributed with mean $14 %$ and standard deviation $3 %$

See this image and copyright information in PMC

References

1. Bloch F (1946) Nuclear induction. Phys Rev 70:460–474. 10.1103/PhysRev.70.460 - DOI
1. Scholand N, Wang X, Roeloffs V, Rosenzweig S, Uecker M (2023) Quantitative MRI by nonlinear inversion of the bloch equations. Magn Reson Med 90(2):520–538. 10.1002/mrm.29664 - DOI - PubMed
1. Lauzon M (2020) A beginner’s guide to Bloch equation simulations of magnetic resonance imaging sequences. arXiv:2009.02789
1. Lee PK, Watkins LE, Anderson TI, Buonincontri G, Hargreaves BA (2019) Flexible and efficient optimization of quantitative sequences using automatic differentiation of Bloch simulations. Magn Reson Med 82:1438–1451. 10.1002/mrm.27832 - DOI - PMC - PubMed
1. Luo T, Noll DC, Fessler JA, Nielsen JF (2021) Joint design of RF and gradient waveforms via auto-differentiation for 3d tailored excitation in MRI. IEEE Trans Med Imaging 40:3305–3314. 10.1109/TMI.2021.3083104 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

57514573/German Academic Exchange Service

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Gpu-accelerated JEMRIS for extensive MRI simulations

Affiliations

Gpu-accelerated JEMRIS for extensive MRI simulations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical