Deep learning as Ricci flow

Anthony Baptista^{1

2

3}, Alessandro Barp^{4

5}, Tapabrata Chakraborti⁴, Chris Harbron⁶, Ben D MacArthur^{4

7

8}, Christopher R S Banerji^{9

10

11}

Affiliations

¹ The Alan Turing Institute, The British Library, London, NW1 2DB, UK. anthbapt@gmail.com.
² School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK. anthbapt@gmail.com.
³ Cancer Bioinformatics, School of Cancer & Pharmaceutical Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK. anthbapt@gmail.com.
⁴ The Alan Turing Institute, The British Library, London, NW1 2DB, UK.
⁵ Faculty of Mathematical & Physical Sciences, University College London, London, UK.
⁶ Roche Pharmaceuticals, Welwyn Garden City, AL7 1TW, UK.
⁷ School of Mathematical Sciences, University of Southampton, Southampton, SO17 1BJ, UK.
⁸ Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK.
⁹ The Alan Turing Institute, The British Library, London, NW1 2DB, UK. cbanerji@turing.ac.uk.
¹⁰ University College London Hospitals, NHS Foundation Trust, London, NW1 2BU, UK. cbanerji@turing.ac.uk.
¹¹ Cancer Bioinformatics, School of Cancer & Pharmaceutical Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK. cbanerji@turing.ac.uk.

PMID: 39379488
PMCID: PMC11461635
DOI: 10.1038/s41598-024-74045-9

Deep learning as Ricci flow

Anthony Baptista et al. Sci Rep. 2024.

. 2024 Oct 8;14(1):23383.

doi: 10.1038/s41598-024-74045-9.

Authors

Anthony Baptista^{1

2

3}, Alessandro Barp^{4

5}, Tapabrata Chakraborti⁴, Chris Harbron⁶, Ben D MacArthur^{4

7

8}, Christopher R S Banerji^{9

10

11}

Affiliations

¹ The Alan Turing Institute, The British Library, London, NW1 2DB, UK. anthbapt@gmail.com.
² School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK. anthbapt@gmail.com.
³ Cancer Bioinformatics, School of Cancer & Pharmaceutical Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK. anthbapt@gmail.com.
⁴ The Alan Turing Institute, The British Library, London, NW1 2DB, UK.
⁵ Faculty of Mathematical & Physical Sciences, University College London, London, UK.
⁶ Roche Pharmaceuticals, Welwyn Garden City, AL7 1TW, UK.
⁷ School of Mathematical Sciences, University of Southampton, Southampton, SO17 1BJ, UK.
⁸ Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK.
⁹ The Alan Turing Institute, The British Library, London, NW1 2DB, UK. cbanerji@turing.ac.uk.
¹⁰ University College London Hospitals, NHS Foundation Trust, London, NW1 2BU, UK. cbanerji@turing.ac.uk.
¹¹ Cancer Bioinformatics, School of Cancer & Pharmaceutical Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK. cbanerji@turing.ac.uk.

PMID: 39379488
PMCID: PMC11461635
DOI: 10.1038/s41598-024-74045-9

Abstract

Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data. It is known that data passing through a trained DNN classifier undergoes a series of geometric and topological simplifications. While some progress has been made toward understanding these transformations in neural networks with smooth activation functions, an understanding in the more general setting of non-smooth activation functions, such as the rectified linear unit (ReLU), which tend to perform better, is required. Here we propose that the geometric transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow-a tool from differential geometry that evolves a manifold by smoothing its curvature, in order to identify its topology. To illustrate this idea, we present a computational framework to quantify the geometric changes that occur as data passes through successive layers of a DNN, and use this framework to motivate a notion of 'global Ricci network flow' that can be used to assess a DNN's ability to disentangle complex data geometries to solve classification problems. By training more than 1500 DNN classifiers of different widths and depths on synthetic and real-world data, we show that the strength of global Ricci network flow-like behaviour correlates with accuracy for well-trained DNNs, independently of depth, width and data set. Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.

Keywords: Complex network; Deep learning; Differential geometry; Ricci flow.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Deep learning and Ricci flow. (A) An example of deep learning. The structure of two non-linearly separable, entwined, manifolds is learned by a deep neural network (DNN). A test set, consisting of random samples drawn from the two manifolds, is passed through the trained DNN and the output of each layer is visualised via its first two principal components. As the test set passes through the layers of the trained DNN, irregularities in the geometry of the data are smoothed, and the two manifolds are separated. (B) An example of Ricci flow. An irregular manifold, consisting of two generally positively curved regions joined by a region of negative curvature, evolves according to a Ricci flow. The irregularities on the positively curved regions are smoothed and the negatively curved region expands, separating them.

**Fig. 2**
Data sets for binary classification and DNN architectures trained. (A) Three synthetic data sets A, B and C describe binary classification problems with different degrees of geometric and topological entanglement. We also considered two binary classification problems from the MNIST data set: distinguishing similar looking numbers (‘1’ vs ‘7’ and ‘6’ vs ‘8’). Finally, we considered two binary classification problems from the fashion MNIST data set: distinguishing similar looking items of clothing (‘sandals’ vs ‘ankle boots’ and ‘shirts’ vs ‘coats’). (B) For each problem, three different DNN widths were considered: narrow (25 nodes wide); wide (50 nodes wide); and bottleneck, as shown. For each choice of width, two depths were trained: shallow (5 hidden layers) and deep (11 hidden layers).

**Fig. 3**
Ricci flow-like behaviour and the number of nearest neighbours k. (A) Heatmap of aggregated Ricci coefficients, computed across $\geq 25$ DNNs of a given width and depth trained on a given data set, for various values of k evaluated for synthetic test data sets A, B and C of 1000 points. (B) Heatmap of aggregated Ricci coefficients, computed across $\geq 25$ DNNs of a given width and depth trained on a given data set, for various values of k evaluated for binary comparisons in the MNIST and fMNIST data sets with test data sets of $\sim 2000$ points. Black boxes outline the value of k yielding the most negative aggregated Ricci coef. For both heatmaps the dendrogram shows the results of hierarchical clustering using the aggregated Ricci coefficients as a feature. We see that higher values of k are required to observe Ricci flow-like behaviour in the MNIST data sets compared to the synthetic sets.

**Fig. 4**
Ricci flow-like behaviour has different implications for different data sets Scatter plots display total curvature at layer l against total change in distance between point pairs at layers $l + 1$ and l, for each data set and various DNN architectures. The Ricci coefficient ( $ρ$ ) for each DNN is presented on each plot. For the synthetic data sets, total distance change between points increases through the layers and curvature drops, implying a separation of points from different classes. Conversely for MNIST and fMNIST total distance change decreases through the layers and curvature increases, implying an aggregation of points from the same class.

See this image and copyright information in PMC

References

1. Kotsiantis, S. B. et al. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng.160 (1), 3–24 (2007).
1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521 (7553), 436–444 (2015). - PubMed
1. Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst.2 (4), 303–314 (1989).
1. Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw.2 (5), 359–366 (1989).
1. Montufar, G. F., Pascanu, R., Cho, K. & Bengio, Y. On the number of linear regions of deep neural networks. Adv. Neural Inf. Process. Syst.27 (2014).

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep learning as Ricci flow

Affiliations

Deep learning as Ricci flow

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources