CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION

Igor Silin¹, Jianqing Fan¹

Affiliations

PMID: 36148472
PMCID: PMC9491498
DOI: 10.1214/21-aos2116

CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION

Igor Silin et al. Ann Stat. 2022 Feb.

. 2022 Feb;50(1):460-486.

doi: 10.1214/21-aos2116. Epub 2022 Feb 16.

Authors

Igor Silin¹, Jianqing Fan¹

Affiliation

¹ Princeton University.

PMID: 36148472
PMCID: PMC9491498
DOI: 10.1214/21-aos2116

Abstract

We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample R ². The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.

Keywords: 62H25; High-dimensional linear regression; Primary 62J05; covariance eigenvalues decay; principal component regression; relative errors; secondary 62H12; thresholding.

PubMed Disclaimer

Figures

**Fig 1:**
Comparison of NCT, GCT, and PCR estimators on an artificial example with r = 12. On the horizontal axes – an index of the component j, on the vertical axes – the coefficient of the canonical least squares ${\tilde{θ}}_{j}^{L S}$ . The red dotted lines depict the thresholding boundaries. The coefficients falling into the shaded area are thresholded/truncated to zero and depicted in gray. The coefficients surviving the thresholding/truncation are depicted in purple, orange, and teal, respectively.

**Fig 2:**
Dependence of $D_{q, k}^{eff} (Σ, β)$ on d.

**Fig 3:**
Rates for the relative errors of the NCT estimator in polynomial decay scenario.

See this image and copyright information in PMC

References

1. BAIR E, HASTIE T, PAUL D and TIBSHIRANI R (2006). Prediction by supervised principal components. J. Amer. Statist. Assoc., 101, 473, 119–137.
1. BARTLETT P, LONG P, LUGOSI G and TSIGLER A (2020). Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA. - PMC - PubMed
1. BELKIN M (2018). Approximation beats concentration? An approximation view on inference with smooth radial kernels. Proc. Mach. Learn. Res., 75, 1–18.
1. BELKIN M, HSU D and XU J (2019). Two models of double descent for weak features. ArXiv:1903.07571.
1. BELLEC P, LECUÉ G and TSYBAKOV A (2018). SLOPE meets Lasso: improved oracle bounds and optimality. Ann. Statist., 46, 6B, 3603–3642.

Grants and funding

R01 GM072611/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION

Affiliation

CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources