GPU optimization techniques to accelerate optiGAN-a particle simulation GAN

Anirudh Srikanth¹, Carlotta Trigila¹, Emilie Roncali^{1

2}

Affiliations

¹ Department of Biomedical Engineering, University of California, Davis, Davis, CA, United States of America.
² Department of Radiology, University of California, Davis, Davis, CA, United States of America.

PMID: 38881563
PMCID: PMC11170465
DOI: 10.1088/2632-2153/ad51c9

GPU optimization techniques to accelerate optiGAN-a particle simulation GAN

Anirudh Srikanth et al. Mach Learn Sci Technol. 2024.

. 2024 Jun 1;5(2):027001.

doi: 10.1088/2632-2153/ad51c9. Epub 2024 Jun 13.

Authors

Anirudh Srikanth¹, Carlotta Trigila¹, Emilie Roncali^{1

2}

Affiliations

¹ Department of Biomedical Engineering, University of California, Davis, Davis, CA, United States of America.
² Department of Radiology, University of California, Davis, Davis, CA, United States of America.

PMID: 38881563
PMCID: PMC11170465
DOI: 10.1088/2632-2153/ad51c9

Abstract

The demand for specialized hardware to train AI models has increased in tandem with the increase in the model complexity over the recent years. Graphics processing unit (GPU) is one such hardware that is capable of parallelizing operations performed on a large chunk of data. Companies like Nvidia, AMD, and Google have been constantly scaling-up the hardware performance as fast as they can. Nevertheless, there is still a gap between the required processing power and processing capacity of the hardware. To increase the hardware utilization, the software has to be optimized too. In this paper, we present some general GPU optimization techniques we used to efficiently train the optiGAN model, a Generative Adversarial Network that is capable of generating multidimensional probability distributions of optical photons at the photodetector face in radiation detectors, on an 8GB Nvidia Quadro RTX 4000 GPU. We analyze and compare the performances of all the optimizations based on the execution time and the memory consumed using the Nvidia Nsight Systems profiler tool. The optimizations gave approximately a 4.5x increase in the runtime performance when compared to a naive training on the GPU, without compromising the model performance. Finally we discuss optiGANs future work and how we are planning to scale the model on GPUs.

Keywords: Monte-Carlo simulation; generative adversarial networks; graphics processing unit; multidimensional probability distributions; performance optimization; radiation detector.

PubMed Disclaimer

Figures

**Figure 1.**
CPU and GPU architectures.

**Figure 2.**
OptiGAN training dataset. Accurate optical simulations were performed at different emission positions inside a crystal to train and test the conditional generative adversarial network optiGAN. Optical photon distributions (position, directions, and energy) were stored for 140 emission points in a multidimensional matrix which included the source 3D emission positions. This tabular data was used as the high-fidelity training dataset of the optiGAN.

**Figure 3.**
OptiGAN architecture (from Trigila *et al* (2023)). It consists of a Generator (a) and a Discriminator/Critic network (b) with H = 128 hidden nodes and a ReLU activation function.

**Figure 4.**
Automatic mixed precision training pipeline.

**Figure 5.**
GPU Profiling results using Nvidia Nsight Systems. It shows the runtime performance of different sections of the model training (like Dataloading, Generator and Discriminator training process), GPU memory usage, GPU utilization (SM active and SM instructions), and the tensor core activity. These events were sampled at a rate of 10 kHz. The pink regions consists of several vertical lines which represents the percentage of the GPU cores that were active at that moment and the the blue regions represents the percentage of instructions issued by the SMs at that moment.

**Figure 6.**
Dataloader optimization results. The runtime (22.9 (s)) dropped by almost 2 times from the previous GPU runtime showed in figure 5. The GPU is active during most of the training duration. However, the memory usage was not optimized with this technique.

**Figure 7.**
Automatic Mixed Precision optimization results. The runtime was further reduced (10.4 (s) and the memory usage was reduced to 3.37 GiB using tensor cores, that specializes in optimizing deep learning computation. It also gives the possibility to increase the batch size.

**Figure 8.**
optiGAN model execution time and batch size comparison in CPU and GPU.

**Figure 9.**
Execution time comparison of the GPU optimizations.

See this image and copyright information in PMC

References

1. Allison J, et al. Recent developments in Geant4. Nucl. Instrum. Methods Phys. Res. A . 2016;835:186–225. doi: 10.1016/j.nima.2016.06.125. - DOI
1. Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. 2017 (arXiv: 1701.07875 [cs, stat])
1. Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E. cuDNN: efficient primitives for deep learning. 2014 (arXiv: 1410.0759 [cs])
1. Dao T. FlashAttention-2: faster attention with better parallelism and work partitioning. 2023 (arXiv: 2307.08691 [cs])
1. Data Sheet: Quadro RTX 4000 2019. (available at: www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/quadro-p...)

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GPU optimization techniques to accelerate optiGAN-a particle simulation GAN

Affiliations

GPU optimization techniques to accelerate optiGAN-a particle simulation GAN

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous