Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 17;15(1):9957.
doi: 10.1038/s41467-024-54281-3.

Dynamical regimes of diffusion models

Affiliations

Dynamical regimes of diffusion models

Giulio Biroli et al. Nat Commun. .

Abstract

We study generative diffusion models in the regime where both the data dimension and the sample size are large, and the score function is trained optimally. Using statistical physics methods, we identify three distinct dynamical regimes during the generative diffusion process. The generative dynamics, starting from pure noise, first encounters a speciation transition, where the broad structure of the data emerges, akin to symmetry breaking in phase transitions. This is followed by a collapse phase, where the dynamics is attracted to a specific training point through a mechanism similar to condensation in a glass phase. The speciation time can be obtained from a spectral analysis of the data's correlation matrix, while the collapse time relates to an excess entropy measure, and reveals the existence of a curse of dimensionality for diffusion models. These theoretical findings are supported by analytical solutions for Gaussian mixtures and confirmed by numerical experiments on real datasets.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Illustration of the three regimes of the backward dynamics through an example corresponding to a Gaussian mixture in two dimensions.
Trajectories are colored white and blue according to their class at the end of the backward dynamics. In regime I, blue and white trajectories are fluctuating within the same bundle and x is similar to white noise. At the speciation time tS, the ensembles of blue and white trajectories divide and head towards the distribution associated to their respective class. Regime II is where the generative process constructs an x which resembles to one element of the class (e.g., a seashore in the illustration) without being linked to any data of the training set. At the collapse time tC, trajectories start to be attracted by the data point on which they collapse at t = 0. Regime III corresponds to memorization, whereas in regime I and II, the diffusion model truly generalizes. The images on the right and on the left are illustrations obtained from our ImageNet numerical experiment (notice the collapse on the panda and seashore from the training set at t = 0).
Fig. 2
Fig. 2. Speciation in Gaussian mixtures.
Evolution of the probability ϕ(t) that the two clones end up in the same cluster as a function of t/tS for several values of d at fixed μ~=1 and σ = 1. The solid line corresponds to the evaluation of (6) while the dots are obtained by sampling 10,000 clone trajectories. The vertical (resp. horizontal) dashed line corresponds to t/tS = 1 (resp. ϕ(t) = 0.775). Error bars correspond to thrice standard error.
Fig. 3
Fig. 3. Collapse in Gaussian mixtures.
Evolution of the excess entropy density fe(t)/α as a function of time t for several values of d, at fixed n = 20,000. The solid lines are the theoretical predictions while the dots show the results of the numerical evaluation approximating the entropy from the dataset. The vertical dashed lines represent the collapse time tC predicted analytically for Gaussian mixtures given in (10). Error bars correspond to thrice the standard error.
Fig. 4
Fig. 4. Speciation in realistic datasets.
Evolution of ϕ(t), the probability that the two clones end up in the same class, as a function of t/tS for several image datasets. The values of tS are the theoretical prediction for the speciation time obtained using (4) and listed in Table 1. The dashed horizontal line indicate ϕ(t) = 0.775, the error bars correspond to thrice the standard error and the solid lines linearly interpolate the experimental points.
Fig. 5
Fig. 5. Collapse in realistic datasets (ImageNet16, ImageNet32 and LSUN).
(Top-left) Evolution of ϕC(t), the probability that two cloned trajectories collapse on the same data of the training set at time zero. (Top-right) Histograms of t^c derived from the last-changing indices μ on 4000 generated samples for the LSUN dataset trained with n = 200. (Bottom) Evolution of the empirical excess entropy f(t)/α. In all panels, the colored vertical dashed lines indicate the average of t^C. The error bars correspond to thrice the standard error.

References

    1. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proc.International Conference on Machine Learning. (PMLR, 2015).
    1. Song, Y. & Ermon, S. Generative modeling by estimating gradients of the data distribution. In Proc.Advances in Neural Information Processing Systems. (Curran Associates Inc., 2019).
    1. Song, Y. et al. Score-based generative modeling through stochastic differential equations. In Proc.International Conference on Learning Representations (2021).
    1. Guth, F., Coste, S., De Bortoli, V. & Mallat, S. Wavelet score-based generative modeling. Adv. Neural Inf. Process. Syst.35, 478–491 (2022).
    1. Yang, L. et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput. Surv.56, 1–39 (2023).

LinkOut - more resources