Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 8;14(1):23383.
doi: 10.1038/s41598-024-74045-9.

Deep learning as Ricci flow

Affiliations

Deep learning as Ricci flow

Anthony Baptista et al. Sci Rep. .

Abstract

Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data. It is known that data passing through a trained DNN classifier undergoes a series of geometric and topological simplifications. While some progress has been made toward understanding these transformations in neural networks with smooth activation functions, an understanding in the more general setting of non-smooth activation functions, such as the rectified linear unit (ReLU), which tend to perform better, is required. Here we propose that the geometric transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow-a tool from differential geometry that evolves a manifold by smoothing its curvature, in order to identify its topology. To illustrate this idea, we present a computational framework to quantify the geometric changes that occur as data passes through successive layers of a DNN, and use this framework to motivate a notion of 'global Ricci network flow' that can be used to assess a DNN's ability to disentangle complex data geometries to solve classification problems. By training more than 1500 DNN classifiers of different widths and depths on synthetic and real-world data, we show that the strength of global Ricci network flow-like behaviour correlates with accuracy for well-trained DNNs, independently of depth, width and data set. Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.

Keywords: Complex network; Deep learning; Differential geometry; Ricci flow.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Deep learning and Ricci flow. (A) An example of deep learning. The structure of two non-linearly separable, entwined, manifolds is learned by a deep neural network (DNN). A test set, consisting of random samples drawn from the two manifolds, is passed through the trained DNN and the output of each layer is visualised via its first two principal components. As the test set passes through the layers of the trained DNN, irregularities in the geometry of the data are smoothed, and the two manifolds are separated. (B) An example of Ricci flow. An irregular manifold, consisting of two generally positively curved regions joined by a region of negative curvature, evolves according to a Ricci flow. The irregularities on the positively curved regions are smoothed and the negatively curved region expands, separating them.
Fig. 2
Fig. 2
Data sets for binary classification and DNN architectures trained. (A) Three synthetic data sets A, B and C describe binary classification problems with different degrees of geometric and topological entanglement. We also considered two binary classification problems from the MNIST data set: distinguishing similar looking numbers (‘1’ vs ‘7’ and ‘6’ vs ‘8’). Finally, we considered two binary classification problems from the fashion MNIST data set: distinguishing similar looking items of clothing (‘sandals’ vs ‘ankle boots’ and ‘shirts’ vs ‘coats’). (B) For each problem, three different DNN widths were considered: narrow (25 nodes wide); wide (50 nodes wide); and bottleneck, as shown. For each choice of width, two depths were trained: shallow (5 hidden layers) and deep (11 hidden layers).
Fig. 3
Fig. 3
Ricci flow-like behaviour and the number of nearest neighbours k. (A) Heatmap of aggregated Ricci coefficients, computed across 25 DNNs of a given width and depth trained on a given data set, for various values of k evaluated for synthetic test data sets A, B and C of 1000 points. (B) Heatmap of aggregated Ricci coefficients, computed across 25 DNNs of a given width and depth trained on a given data set, for various values of k evaluated for binary comparisons in the MNIST and fMNIST data sets with test data sets of 2000 points. Black boxes outline the value of k yielding the most negative aggregated Ricci coef. For both heatmaps the dendrogram shows the results of hierarchical clustering using the aggregated Ricci coefficients as a feature. We see that higher values of k are required to observe Ricci flow-like behaviour in the MNIST data sets compared to the synthetic sets.
Fig. 4
Fig. 4
Ricci flow-like behaviour has different implications for different data sets Scatter plots display total curvature at layer l against total change in distance between point pairs at layers l+1 and l, for each data set and various DNN architectures. The Ricci coefficient (ρ) for each DNN is presented on each plot. For the synthetic data sets, total distance change between points increases through the layers and curvature drops, implying a separation of points from different classes. Conversely for MNIST and fMNIST total distance change decreases through the layers and curvature increases, implying an aggregation of points from the same class.

References

    1. Kotsiantis, S. B. et al. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng.160 (1), 3–24 (2007).
    1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521 (7553), 436–444 (2015). - PubMed
    1. Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst.2 (4), 303–314 (1989).
    1. Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw.2 (5), 359–366 (1989).
    1. Montufar, G. F., Pascanu, R., Cho, K. & Bengio, Y. On the number of linear regions of deep neural networks. Adv. Neural Inf. Process. Syst.27 (2014).

LinkOut - more resources