A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Xiaoyu Ma^{1

2}, Sylvain Sardy², Nick Hengartner³, Nikolai Bobenko², Yen Ting Lin⁴

Affiliations

¹ Shandong University, Jinan, China.
² Department of Mathematics, University of Geneva, Geneva, Switzerland.
³ Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, USA.
⁴ Information Sciences Group, Los Alamos National Laboratory, Los Alamos, USA.

PMID: 36299529
PMCID: PMC9587964
DOI: 10.1007/s11222-022-10169-0

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Xiaoyu Ma et al. Stat Comput. 2022.

. 2022;32(6):99.

doi: 10.1007/s11222-022-10169-0. Epub 2022 Oct 22.

Authors

Xiaoyu Ma^{1

2}, Sylvain Sardy², Nick Hengartner³, Nikolai Bobenko², Yen Ting Lin⁴

Affiliations

¹ Shandong University, Jinan, China.
² Department of Mathematics, University of Geneva, Geneva, Switzerland.
³ Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, USA.
⁴ Information Sciences Group, Los Alamos National Laboratory, Los Alamos, USA.

PMID: 36299529
PMCID: PMC9587964
DOI: 10.1007/s11222-022-10169-0

Abstract

To fit sparse linear associations, a LASSO sparsity inducing penalty with a single hyperparameter provably allows to recover the important features (needles) with high probability in certain regimes even if the sample size is smaller than the dimension of the input vector (haystack). More recently learners known as artificial neural networks (ANN) have shown great successes in many machine learning tasks, in particular fitting nonlinear associations. Small learning rate, stochastic gradient descent algorithm and large training set help to cope with the explosion in the number of parameters present in deep neural networks. Yet few ANN learners have been developed and studied to find needles in nonlinear haystacks. Driven by a single hyperparameter, our ANN learner, like for sparse linear associations, exhibits a phase transition in the probability of retrieving the needles, which we do not observe with other ANN learners. To select our penalty parameter, we generalize the universal threshold of Donoho and Johnstone (Biometrika 81(3):425-455, 1994) which is a better rule than the conservative (too many false detections) and expensive cross-validation. In the spirit of simulated annealing, we propose a warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem. We perform simulated and real data Monte Carlo experiments to quantify the effectiveness of our approach.

Keywords: Model selection; Neural networks; Phase transition; Sparsity; Universal threshold.

PubMed Disclaimer

Figures

**Fig. 1**
Monte-Carlo simulation results for linear association plotting the estimated probability of exact support recovery (PESR). Left plot: the two black curves assume a linear model while the color curves assume an ANN model; the two blue lines (light for spinn and dark for spinn-dropout) and the grey line (BNN) are governed by more than one hyperparameter while the red line (LASSO ANN) is governed by a single hyperparameter; the two continuous lines (black for square-root LASSO linear and red for LASSO ANN) select the hyperparameter with QUT while the dashed lines require a validation set. Right plot: LASSO ANN with 2 to 4 layers with its hyperparameter based on QUT

**Fig. 2**
Monte-Carlo simulation results for nonlinear association plotting the estimated probability of exact support recovery (PESR-top left), generalization (PE-top right), true positive rate (TPR-bottom left) and false discovery rate (FDR-bottom right). The red curves are for LASSO ANN with its hyperparameter based on QUT with two (continuous) to four layers (dashed). The two blue lines (light for spinn and dark for spinn-dropout) and and the grey line (BNN) are governed by more than one hyperparameter selected based on a validation set. The green curve is a standard ANN (without sparsity constraint)

**Fig. 3**
Monte-Carlo simulation results based on four representative data sets, namely, Breast, Wine, BCI $_$ 2240, Yeoh of Table 1. The left boxplots are the accuracy results and the right boxplots are the number of selected needles. The horizontal red line is the accuracy by always predicting the most frequent class (that is, without looking at the inputs)

**Fig. 4**
Summary of Monte-Carlo results for all data sets of Table 1. The x-axis measures sparsity on a log-scale with $(\hat{s} + 1) / (p_{1} + 1)$ and the y-axis is accuracy. Ideal points are near the top-left corner of the figure

See this image and copyright information in PMC

References

1. Adcock, B., Brugiapaglia, S., Dexter, N., Morage, S.: Deep neural networks are effective at learning high-dimensional Hilbert-valued functions from limited data. In: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, vol. 145, pp. 1–36. PMLR (2022)
1. Adcock B, Dexter N. The gap between theory and practice in function approximation with deep neural networks. SIAM J. Math. Data Sci. 2021;3(2):624–655. doi: 10.1137/20M131309X. - DOI
1. Advani MS, Saxe AM, Sompolinsky H. High-dimensional dynamics of generalization error in neural networks. Neural Netw. 2020;132:428–446. doi: 10.1016/j.neunet.2020.08.022. - DOI - PMC - PubMed
1. Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010;4:40–79. doi: 10.1214/09-SS054. - DOI
1. Bach F, Jenatton R, Mairal J, Obozinski G. Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 2011;4(1):1–106. doi: 10.1561/2200000015. - DOI

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Affiliations

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources