Noise injection for training artificial neural networks: a comparison with weight decay and early stopping

Richard M Zur¹, Yulei Jiang, Lorenzo L Pesce, Karen Drukker

Affiliations

PMID: 19928111
PMCID: PMC2771718
DOI: 10.1118/1.3213517

Noise injection for training artificial neural networks: a comparison with weight decay and early stopping

Richard M Zur et al. Med Phys. 2009 Oct.

. 2009 Oct;36(10):4810-8.

doi: 10.1118/1.3213517.

Authors

Richard M Zur¹, Yulei Jiang, Lorenzo L Pesce, Karen Drukker

Affiliation

¹ Department of Radiology, The University of Chicago, 5841 South Maryland Avenue, MC2026, Chicago, Illinois 60637, USA. zur@uchicago.edu

PMID: 19928111
PMCID: PMC2771718
DOI: 10.1118/1.3213517

Abstract

The purpose of this study was to investigate the effect of a noise injection method on the "overfitting" problem of artificial neural networks (ANNs) in two-class classification tasks. The authors compared ANNs trained with noise injection to ANNs trained with two other methods for avoiding overfitting: weight decay and early stopping. They also evaluated an automatic algorithm for selecting the magnitude of the noise injection. They performed simulation studies of an exclusive-or classification task with training datasets of 50, 100, and 200 cases (half normal and half abnormal) and an independent testing dataset of 2000 cases. They also compared the methods using a breast ultrasound dataset of 1126 cases. For simulated training datasets of 50 cases, the area under the receiver operating characteristic curve (AUC) was greater (by 0.03) when training with noise injection than when training without any regularization, and the improvement was greater than those from weight decay and early stopping (both of 0.02). For training datasets of 100 cases, noise injection and weight decay yielded similar increases in the AUC (0.02), whereas early stopping produced a smaller increase (0.01). For training datasets of 200 cases, the increases in the AUC were negligibly small for all methods (0.005). For the ultrasound dataset, noise injection had a greater average AUC than ANNs trained without regularization and a slightly greater average AUC than ANNs trained with weight decay. These results indicate that training ANNs with noise injection can reduce overfitting to a greater degree than early stopping and to a similar degree as weight decay.

PubMed Disclaimer

Figures

**Figure 1**
The XOR population. Normal cases were drawn from the dotted-line probability density and abnormal cases were drawn from the solid-line probability density. The lines depict isopleths of the probability densities.

**Figure 2**
Two ANNs’ AUC values obtained with three training and evaluation methods as a function of the training iterations for (a) simulated XOR dataset of 50 training cases, and (b) breast ultrasound dataset of 50 training cases. The results of .632+ bootstrapping were obtained from 50 bootstrapping samples. The results of independent validation for the XOR data were obtained from an independent validation dataset of 2000 cases not used in any way in training. The breast ultrasound data did not have an independent validation dataset. The standard deviations of estimates of AUC, calculated over different training datasets, were approximately 0.04 for the resubstitution and independent validation estimates, and 0.06 for the .632+ bootstrapping estimates.

**Figure 3**
Average ANN performance measured on the independent validation dataset when the ANNs were trained with noise injection of various noise kernel standard deviation values. Empty circles represent the average AUC values and error bars represent one standard deviation. Filled circles represent the average AUC values of ANNs trained with noise standard deviation values estimated by maximizing Eq. 2, the vertical error bars represent one standard deviation in the AUC values, and the horizontal error bars represent one standard deviation in the selected noise kernel standard deviation values. The ANNs in (a) had 6 hidden nodes and their performance was measured at the 485th training iteration; the ANNs in (b) had 20 hidden nodes and their performance was measured at the 1485th training iteration.

**Figure 4**
Contour plots showing classification decision boundaries for (a) ANNs trained without regularization (i.e., overfitting), (b) ANNs trained with noise injection, and (c) ANNs trained with weight decay. The ANNs were trained on a dataset of 50 cases drawn from the breast US dataset, and are shown for fixed values of features 1 and 2.

See this image and copyright information in PMC

Cited by

DCE-Qnet: Deep Network Quantification of Dynamic Contrast Enhanced (DCE) MRI.
Cohen O, Kargar S, Woo S, Vargas A, Otazo R. Cohen O, et al. ArXiv [Preprint]. 2024 May 20:arXiv:2405.12360v1. ArXiv. 2024. Update in: MAGMA. 2024 Dec;37(6):1077-1090. doi: 10.1007/s10334-024-01189-0. PMID: 38827459 Free PMC article. Updated. Preprint.
DCE-Qnet: deep network quantification of dynamic contrast enhanced (DCE) MRI.
Cohen O, Kargar S, Woo S, Vargas A, Otazo R. Cohen O, et al. MAGMA. 2024 Dec;37(6):1077-1090. doi: 10.1007/s10334-024-01189-0. Epub 2024 Aug 8. MAGMA. 2024. PMID: 39112813 Free PMC article.
An end-to-end AI-based framework for automated discovery of rapid CEST/MT MRI acquisition protocols and molecular parameter quantification (AutoCEST).
Perlman O, Zhu B, Zaiss M, Rosen MS, Farrar CT. Perlman O, et al. Magn Reson Med. 2022 Jun;87(6):2792-2810. doi: 10.1002/mrm.29173. Epub 2022 Jan 28. Magn Reson Med. 2022. PMID: 35092076 Free PMC article.
Machine and deep learning methods for radiomics.
Avanzo M, Wei L, Stancanello J, Vallières M, Rao A, Morin O, Mattonen SA, El Naqa I. Avanzo M, et al. Med Phys. 2020 Jun;47(5):e185-e202. doi: 10.1002/mp.13678. Med Phys. 2020. PMID: 32418336 Free PMC article. Review.
Deep learning with robustness to missing data: A novel approach to the detection of COVID-19.
Çallı E, Murphy K, Kurstjens S, Samson T, Herpers R, Smits H, Rutten M, van Ginneken B. Çallı E, et al. PLoS One. 2021 Jul 30;16(7):e0255301. doi: 10.1371/journal.pone.0255301. eCollection 2021. PLoS One. 2021. PMID: 34329354 Free PMC article.

See all "Cited by" articles

References

1. Wu Y., Doi K., Metz C. E., Asada N., and Giger M. L., “Simulation studies of data classification by artificial neural networks: Potential applications in medical imaging and decision making,” J. Digit Imaging ZZZZZZ 6, 117–125 (1993). - PubMed
1. Jiang Y. et al., “Malignant and benign clustered microcalcifications: Automated feature analysis and classification,” Radiology RADLAX 198, 671–678 (1996). - PubMed
1. Bishop C. M., Neural Networks for Pattern Recognition (Oxford University Press, New York, 1995).
1. Kupinski M. A., Edwards D. C., Giger M. L., and Metz C. E., “Ideal observer approximation using Bayesian classification neural networks,” IEEE Trans. Med. Imaging ITMID4 20, 886–899 (2001).10.1109/42.952727 - DOI - PubMed
1. Sietsma J. and Dow R. J. F., “Creating artificial neural networks that generalize,” Neural Networks NNETEB 4, 67–79 (1991).10.1016/0893-6080(91)90033-2 - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Noise injection for training artificial neural networks: a comparison with weight decay and early stopping

Affiliation

Noise injection for training artificial neural networks: a comparison with weight decay and early stopping

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources