Benign overfitting in linear regression

Peter L Bartlett^{1

2}, Philip M Long³, Gábor Lugosi^{4

5

6}, Alexander Tsigler⁷

Affiliations

¹ Department of Statistics, University of California, Berkeley, CA 94720-3860; peter@berkeley.edu.
² Computer Science Division, University of California, Berkeley, CA 94720-1776.
³ Google Brain, Mountain View, CA 94043.
⁴ Economics and Business, Pompeu Fabra University, 08005 Barcelona, Spain.
⁵ Institució Catalana de Recerca i Estudis Avançats, Passeig, Lluís Companys 23, 08010 Barcelona, Spain.
⁶ Barcelona Graduate School of Economics, 08005 Barcelona, Spain.
⁷ Department of Statistics, University of California, Berkeley, CA 94720-3860.

PMID: 32332161
PMCID: PMC7720150
DOI: 10.1073/pnas.1907378117

Benign overfitting in linear regression

Peter L Bartlett et al. Proc Natl Acad Sci U S A. 2020.

. 2020 Dec 1;117(48):30063-30070.

doi: 10.1073/pnas.1907378117. Epub 2020 Apr 24.

Authors

Peter L Bartlett^{1

2}, Philip M Long³, Gábor Lugosi^{4

5

6}, Alexander Tsigler⁷

Affiliations

¹ Department of Statistics, University of California, Berkeley, CA 94720-3860; peter@berkeley.edu.
² Computer Science Division, University of California, Berkeley, CA 94720-1776.
³ Google Brain, Mountain View, CA 94043.
⁴ Economics and Business, Pompeu Fabra University, 08005 Barcelona, Spain.
⁵ Institució Catalana de Recerca i Estudis Avançats, Passeig, Lluís Companys 23, 08010 Barcelona, Spain.
⁶ Barcelona Graduate School of Economics, 08005 Barcelona, Spain.
⁷ Department of Statistics, University of California, Berkeley, CA 94720-3860.

PMID: 32332161
PMCID: PMC7720150
DOI: 10.1073/pnas.1907378117

Abstract

The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of the effective rank of the data covariance. It shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. By studying examples of data covariance properties that this characterization shows are required for benign overfitting, we find an important role for finite-dimensional data: the accuracy of the minimum norm interpolating prediction rule approaches the best possible accuracy for a much narrower range of properties of the data distribution when the data lie in an infinite-dimensional space vs. when the data lie in a finite-dimensional space with dimension that grows faster than the sample size.

Keywords: interpolation; linear regression; overfitting; statistical learning theory.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

References

1. Zhang C., Bengio S., Hardt M., Recht B., Vinyals O., “Understanding deep learning requires rethinking generalization” in 5th International Conference on Learning Representations. https://openreview.net/forum?id=Sy8gdB9xx. Accessed 30 March 2020.
1. Hastie T., Tibshirani R., Friedman J. H., Elements of Statistical Learning (Springer, 2001).
1. Belkin M., Ma S., Mandal S., “To understand deep learning we need to understand kernel learning” in Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, 2018), vol. 80, pp. 540–548.
1. Belkin M., Hsu D., Mitra P., “Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, S. Bengio et al., Eds. (NIPS, 2018), pp. 2306–2317.
1. Belkin M., Rakhlin A., Tsybakov A. B., Does data interpolation contradict statistical optimality? arXiv:1806.09471 (25 June 2018).

Publication types

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Benign overfitting in linear regression

Affiliations

Benign overfitting in linear regression

Authors

Affiliations

Abstract

Conflict of interest statement

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources