Parameter convergence and learning curves for neural networks
- PMID: 10085428
- DOI: 10.1162/089976699300016647
Parameter convergence and learning curves for neural networks
Abstract
We revisit the oft-studied asymptotic (in sample size) behavior of the parameter or weight estimate returned by any member of a large family of neural network training algorithms. By properly accounting for the characteristic property of neural networks that their empirical and generalization errors possess multiple minima, we rigorously establish conditions under which the parameter estimate converges strongly into the set of minima of the generalization error. Convergence of the parameter estimate to a particular value cannot be guaranteed under our assumptions. We then evaluate the asymptotic distribution of the distance between the parameter estimate and its nearest neighbor among the set of minima of the generalization error. Results on this question have appeared numerous times and generally assert asymptotic normality, the conclusion expected from familiar statistical arguments concerned with maximum likelihood estimators. These conclusions are usually reached on the basis of somewhat informal calculations, although we shall see that the situation is somewhat delicate. The preceding results then provide a derivation of learning curves for generalization and empirical errors that leads to bounds on rates of convergence.
Similar articles
-
Window-based example selection in learning vector quantization.Neural Comput. 2010 Nov;22(11):2924-61. doi: 10.1162/NECO_a_00030. Neural Comput. 2010. PMID: 20804387
-
Upper bound of the expected training error of neural network regression for a Gaussian noise sequence.Neural Netw. 2001 Dec;14(10):1419-29. doi: 10.1016/s0893-6080(01)00122-8. Neural Netw. 2001. PMID: 11771721
-
Algebraic geometrical methods for hierarchical learning machines.Neural Netw. 2001 Oct;14(8):1049-60. doi: 10.1016/s0893-6080(01)00069-7. Neural Netw. 2001. PMID: 11681750
-
Minimization of error functionals over perceptron networks.Neural Comput. 2008 Jan;20(1):252-70. doi: 10.1162/neco.2008.20.1.252. Neural Comput. 2008. PMID: 18045008
-
Constructive training methods for feedforward neural networks with binary weights.Int J Neural Syst. 1996 May;7(2):149-66. doi: 10.1142/s0129065796000129. Int J Neural Syst. 1996. PMID: 8823625 Review.
Cited by
-
Deterministic convergence of chaos injection-based gradient method for training feedforward neural networks.Cogn Neurodyn. 2015 Jun;9(3):331-40. doi: 10.1007/s11571-014-9323-z. Epub 2015 Jan 1. Cogn Neurodyn. 2015. PMID: 25972981 Free PMC article.
-
Analysis of the adsorption and retention models for Cd, Cr, Cu, Ni, Pb, and Zn through neural networks: selection of variables and competitive model.Environ Sci Pollut Res Int. 2018 Sep;25(25):25551-25564. doi: 10.1007/s11356-018-2101-4. Epub 2018 Jun 29. Environ Sci Pollut Res Int. 2018. PMID: 29959735
-
Generalization of learning by synchronous waves: from perceptual organization to invariant organization.Cogn Neurodyn. 2011 Jun;5(2):113-32. doi: 10.1007/s11571-010-9142-9. Epub 2010 Dec 10. Cogn Neurodyn. 2011. PMID: 22654985 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources