Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 8;20(19):8397-8404.
doi: 10.1021/acs.jctc.4c00903. Epub 2024 Sep 19.

Parallel Implementation of the Density Matrix Renormalization Group Method Achieving a Quarter petaFLOPS Performance on a Single DGX-H100 GPU Node

Affiliations

Parallel Implementation of the Density Matrix Renormalization Group Method Achieving a Quarter petaFLOPS Performance on a Single DGX-H100 GPU Node

Andor Menczer et al. J Chem Theory Comput. .

Abstract

We report cutting edge performance results on a single node hybrid CPU-multi-GPU implementation of the spin adapted ab initio Density Matrix Renormalization Group (DMRG) method on current state-of-the-art NVIDIA DGX-H100 architectures. We evaluate the performance of the DMRG electronic structure calculations for the active compounds of the FeMoco, the primary cofactor of nitrogenase, and cytochrome P450 (CYP) enzymes with complete active space (CAS) sizes of up to 113 electrons in 76 orbitals [CAS(113, 76)] and 63 electrons in 58 orbitals [CAS(63, 58)], respectively. We achieve 246 teraFLOPS of sustained performance, an improvement of more than 2.5× compared to the performance achieved on the DGX-A100 architectures and an 80× acceleration compared to an OpenMP parallelized implementation on a 128-core CPU architecture. Our work highlights the ability of tensor network algorithms to efficiently utilize high-performance multi-GPU hardware and shows that the combination of tensor networks with modern large-scale GPU accelerators can pave the way toward solving some of the most challenging problems in quantum chemistry and beyond.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Benchmark results obtained via the SU(2) spin-adapted single node hybrid CPU plus multi-GPU DMRG calculations for the F2 molecule on a CAS(18,18) orbital space, the N2 molecule on a CAS(14,28) space, FeMoco on CAS(54,54) and CAS(113,76) spaces, and P450 on CAS(63,58). The solid lines correspond to calculations performed on a DGX-H100 system. As a reference, the dotted lines trace the results obtained on a DGX-A100 system. The estimated FP64 theoretical upper bound for DGX-A100 is shown by the horizontal dashed line, while the same but also including specialized tensor core units (TCUs) by the horizontal dashed–dotted line. Numbers indicate the corresponding U(1) bond dimension values, which are the same for both the dotted and the solid lines.
Figure 2
Figure 2
Total diagonalization time of seven DMRG sweeps for the eight GPU accelerated diagonalization procedure measured in minutes including host-device IO overhead for the F2 CAS(18,18), N2 CAS(14,28), FeMoco CAS(54,54), and CAS(113,76) as a function of DMRG bond dimension on A100 (solid dot symbol, ●) and on H100 (open symbol, ○) architectures. The solid lines are results of first-order polynomial fits on selected data sets corresponding to measured performance up to saturation of GPU performance (black) and for a region where performance is saturated (red). The fitted exponents for the H100 calculations are 1.05 ± 0.1 and 2.95 ± 0.2, respectively.
Figure 3
Figure 3
Scaling of the energy for spin states with total spin 1/2 (left panel), 3/2 (middle panel), and 5/2 (right panel) as a function of the inverse DMRG SU(2) bond dimension for the Cytochrome P450 enzyme for the model spaces of CAS(17,15), CAS(25,23), CAS(33,31), CAS(45,41), CAS(47,43), and CAS(63,58) introduced in ref (65), shown by dark blue, red, orange, purple, green, and light blue colors, respectively. Solid lines are the result of second-order polynomial fits.
Figure 4
Figure 4
Extrapolated (D) spin gap (mHartree) between the spin 1/2 ground and spin 3/2 excited states (left panel) and between the spin 1/2 ground and spin 5/2 excited states (right panel) as a function of model CAS spaces with increasing complexity, i.e., with increasing number of orbitals and number of electrons (data from ref (65)).

References

    1. Bardeen J.; Brattain W. H. The Transistor, A Semi-Conductor Triode. Phys. Rev. 1948, 74, 230–231. 10.1103/PhysRev.74.230. - DOI
    1. Francesco P.; Mathieu P.; Sénéchal D.. Conformal field theory; Springer Science & Business Media, 2012.
    1. Kohn W.; Sham L. J. Self-Consistent Equations Including Exchange and Correlation Effects. Phys. Rev. 1965, 140, A1133–A1138. 10.1103/PhysRev.140.A1133. - DOI
    1. Cohen A. J.; Mori-Sánchez P.; Yang W. Challenges for Density Functional Theory. Chem. Rev. 2012, 112, 289–320. 10.1021/cr200107z. - DOI - PubMed
    1. Becke A. D. Perspective: Fifty years of density-functional theory in chemical physics. J. Chem. Phys. 2014, 140, 18A301.10.1063/1.4869598. - DOI - PubMed

LinkOut - more resources