Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct;36(10):4810-8.
doi: 10.1118/1.3213517.

Noise injection for training artificial neural networks: a comparison with weight decay and early stopping

Affiliations

Noise injection for training artificial neural networks: a comparison with weight decay and early stopping

Richard M Zur et al. Med Phys. 2009 Oct.

Abstract

The purpose of this study was to investigate the effect of a noise injection method on the "overfitting" problem of artificial neural networks (ANNs) in two-class classification tasks. The authors compared ANNs trained with noise injection to ANNs trained with two other methods for avoiding overfitting: weight decay and early stopping. They also evaluated an automatic algorithm for selecting the magnitude of the noise injection. They performed simulation studies of an exclusive-or classification task with training datasets of 50, 100, and 200 cases (half normal and half abnormal) and an independent testing dataset of 2000 cases. They also compared the methods using a breast ultrasound dataset of 1126 cases. For simulated training datasets of 50 cases, the area under the receiver operating characteristic curve (AUC) was greater (by 0.03) when training with noise injection than when training without any regularization, and the improvement was greater than those from weight decay and early stopping (both of 0.02). For training datasets of 100 cases, noise injection and weight decay yielded similar increases in the AUC (0.02), whereas early stopping produced a smaller increase (0.01). For training datasets of 200 cases, the increases in the AUC were negligibly small for all methods (0.005). For the ultrasound dataset, noise injection had a greater average AUC than ANNs trained without regularization and a slightly greater average AUC than ANNs trained with weight decay. These results indicate that training ANNs with noise injection can reduce overfitting to a greater degree than early stopping and to a similar degree as weight decay.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The XOR population. Normal cases were drawn from the dotted-line probability density and abnormal cases were drawn from the solid-line probability density. The lines depict isopleths of the probability densities.
Figure 2
Figure 2
Two ANNs’ AUC values obtained with three training and evaluation methods as a function of the training iterations for (a) simulated XOR dataset of 50 training cases, and (b) breast ultrasound dataset of 50 training cases. The results of .632+ bootstrapping were obtained from 50 bootstrapping samples. The results of independent validation for the XOR data were obtained from an independent validation dataset of 2000 cases not used in any way in training. The breast ultrasound data did not have an independent validation dataset. The standard deviations of estimates of AUC, calculated over different training datasets, were approximately 0.04 for the resubstitution and independent validation estimates, and 0.06 for the .632+ bootstrapping estimates.
Figure 3
Figure 3
Average ANN performance measured on the independent validation dataset when the ANNs were trained with noise injection of various noise kernel standard deviation values. Empty circles represent the average AUC values and error bars represent one standard deviation. Filled circles represent the average AUC values of ANNs trained with noise standard deviation values estimated by maximizing Eq. 2, the vertical error bars represent one standard deviation in the AUC values, and the horizontal error bars represent one standard deviation in the selected noise kernel standard deviation values. The ANNs in (a) had 6 hidden nodes and their performance was measured at the 485th training iteration; the ANNs in (b) had 20 hidden nodes and their performance was measured at the 1485th training iteration.
Figure 4
Figure 4
Contour plots showing classification decision boundaries for (a) ANNs trained without regularization (i.e., overfitting), (b) ANNs trained with noise injection, and (c) ANNs trained with weight decay. The ANNs were trained on a dataset of 50 cases drawn from the breast US dataset, and are shown for fixed values of features 1 and 2.

Similar articles

Cited by

References

    1. Wu Y., Doi K., Metz C. E., Asada N., and Giger M. L., “Simulation studies of data classification by artificial neural networks: Potential applications in medical imaging and decision making,” J. Digit Imaging ZZZZZZ 6, 117–125 (1993). - PubMed
    1. Jiang Y. et al., “Malignant and benign clustered microcalcifications: Automated feature analysis and classification,” Radiology RADLAX 198, 671–678 (1996). - PubMed
    1. Bishop C. M., Neural Networks for Pattern Recognition (Oxford University Press, New York, 1995).
    1. Kupinski M. A., Edwards D. C., Giger M. L., and Metz C. E., “Ideal observer approximation using Bayesian classification neural networks,” IEEE Trans. Med. Imaging ITMID4 20, 886–899 (2001).10.1109/42.952727 - DOI - PubMed
    1. Sietsma J. and Dow R. J. F., “Creating artificial neural networks that generalize,” Neural Networks NNETEB 4, 67–79 (1991).10.1016/0893-6080(91)90033-2 - DOI

Publication types

LinkOut - more resources