Flat minima
- PMID: 9117894
- DOI: 10.1162/neco.1997.9.1.1
Flat minima
Abstract
We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to "simple" networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a "good" weight prior. Instead we have a prior over input-output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and "optimal brain surgeon/optimal brain damage".
Similar articles
-
Analytical interpretation of feed-forward nets outputs after training.Int J Neural Syst. 1996 Mar;7(1):19-27. doi: 10.1142/s0129065796000038. Int J Neural Syst. 1996. PMID: 8828047
-
A learning rule for very simple universal approximators consisting of a single layer of perceptrons.Neural Netw. 2008 Jun;21(5):786-95. doi: 10.1016/j.neunet.2007.12.036. Epub 2007 Dec 31. Neural Netw. 2008. PMID: 18249524
-
Robust sequential learning of feedforward neural networks in the presence of heavy-tailed noise.Neural Netw. 2015 Mar;63:31-47. doi: 10.1016/j.neunet.2014.11.001. Epub 2014 Nov 15. Neural Netw. 2015. PMID: 25436486
-
Parameter convergence and learning curves for neural networks.Neural Comput. 1999 Apr 1;11(3):747-70. doi: 10.1162/089976699300016647. Neural Comput. 1999. PMID: 10085428 Review.
-
Bayesian regularization of neural networks.Methods Mol Biol. 2008;458:25-44. doi: 10.1007/978-1-60327-101-1_3. Methods Mol Biol. 2008. PMID: 19065804 Review.
Cited by
-
An Advanced Long Short-Term Memory (LSTM) Neural Network Method for Predicting Rate of Penetration (ROP).ACS Omega. 2022 Dec 21;8(1):934-945. doi: 10.1021/acsomega.2c06308. eCollection 2023 Jan 10. ACS Omega. 2022. PMID: 36643527 Free PMC article.
-
The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima.Proc Natl Acad Sci U S A. 2021 Mar 2;118(9):e2015617118. doi: 10.1073/pnas.2015617118. Proc Natl Acad Sci U S A. 2021. PMID: 33619091 Free PMC article.
-
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives.Med Image Anal. 2023 Apr;85:102762. doi: 10.1016/j.media.2023.102762. Epub 2023 Jan 31. Med Image Anal. 2023. PMID: 36738650 Free PMC article. Review.
-
Automated Depression Detection Using Deep Representation and Sequence Learning with EEG Signals.J Med Syst. 2019 May 28;43(7):205. doi: 10.1007/s10916-019-1345-y. J Med Syst. 2019. PMID: 31139932
-
Archetypal landscapes for deep neural networks.Proc Natl Acad Sci U S A. 2020 Sep 8;117(36):21857-21864. doi: 10.1073/pnas.1919995117. Epub 2020 Aug 25. Proc Natl Acad Sci U S A. 2020. PMID: 32843349 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources