Flat minima

doi:10.1162/neco.1997.9.1.1

. 1997 Jan 1;9(1):1-42.

doi: 10.1162/neco.1997.9.1.1.

Flat minima

S Hochreiter¹, J Schmidhuber

Affiliations

PMID: 9117894
DOI: 10.1162/neco.1997.9.1.1

Flat minima

S Hochreiter et al. Neural Comput. 1997.

. 1997 Jan 1;9(1):1-42.

doi: 10.1162/neco.1997.9.1.1.

Authors

S Hochreiter¹, J Schmidhuber

Affiliation

¹ Fakultät für Informatik, Technische Universität München, Germany.

PMID: 9117894
DOI: 10.1162/neco.1997.9.1.1

Abstract

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to "simple" networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a "good" weight prior. Instead we have a prior over input-output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and "optimal brain surgeon/optimal brain damage".

PubMed Disclaimer

Cited by

An Advanced Long Short-Term Memory (LSTM) Neural Network Method for Predicting Rate of Penetration (ROP).
Ji H, Lou Y, Cheng S, Xie Z, Zhu L. Ji H, et al. ACS Omega. 2022 Dec 21;8(1):934-945. doi: 10.1021/acsomega.2c06308. eCollection 2023 Jan 10. ACS Omega. 2022. PMID: 36643527 Free PMC article.
The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima.
Feng Y, Tu Y. Feng Y, et al. Proc Natl Acad Sci U S A. 2021 Mar 2;118(9):e2015617118. doi: 10.1073/pnas.2015617118. Proc Natl Acad Sci U S A. 2021. PMID: 33619091 Free PMC article.
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives.
Li J, Chen J, Tang Y, Wang C, Landman BA, Zhou SK. Li J, et al. Med Image Anal. 2023 Apr;85:102762. doi: 10.1016/j.media.2023.102762. Epub 2023 Jan 31. Med Image Anal. 2023. PMID: 36738650 Free PMC article. Review.
Automated Depression Detection Using Deep Representation and Sequence Learning with EEG Signals.
Ay B, Yildirim O, Talo M, Baloglu UB, Aydin G, Puthankattil SD, Acharya UR. Ay B, et al. J Med Syst. 2019 May 28;43(7):205. doi: 10.1007/s10916-019-1345-y. J Med Syst. 2019. PMID: 31139932
Archetypal landscapes for deep neural networks.
Verpoort PC, Lee AA, Wales DJ. Verpoort PC, et al. Proc Natl Acad Sci U S A. 2020 Sep 8;117(36):21857-21864. doi: 10.1073/pnas.1919995117. Epub 2020 Aug 25. Proc Natl Acad Sci U S A. 2020. PMID: 32843349 Free PMC article.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Silverchair Information Systems
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Flat minima

Affiliation

Flat minima

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources