. 2013 Sep 1;75(4):10.1111/rssb.12016.

doi: 10.1111/rssb.12016.

Large Covariance Estimation by Thresholding Principal Orthogonal Complements

Jianqing Fan¹, Yuan Liao², Martina Mincheva³

Affiliations

¹ Department of Operations Research and Financial Engineering, Princeton University ; Bendheim Center for Finance, Princeton University.
² Department of Mathematics, University of Maryland.
³ Department of Operations Research and Financial Engineering, Princeton University.

PMID: 24348088
PMCID: PMC3859166
DOI: 10.1111/rssb.12016

Large Covariance Estimation by Thresholding Principal Orthogonal Complements

Jianqing Fan et al. J R Stat Soc Series B Stat Methodol. 2013.

. 2013 Sep 1;75(4):10.1111/rssb.12016.

doi: 10.1111/rssb.12016.

Authors

Jianqing Fan¹, Yuan Liao², Martina Mincheva³

Affiliations

¹ Department of Operations Research and Financial Engineering, Princeton University ; Bendheim Center for Finance, Princeton University.
² Department of Mathematics, University of Maryland.
³ Department of Operations Research and Financial Engineering, Princeton University.

PMID: 24348088
PMCID: PMC3859166
DOI: 10.1111/rssb.12016

Abstract

This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.

Keywords: High-dimensionality; approximate factor model; cross-sectional correlation; diverging eigenvalues; low-rank matrix; principal components; sparse matrix; thresholding; unknown factors.

PubMed Disclaimer

Figures

**Figure 1**
Minimum eigenvalue of ${\sum^{^}}_{u, \hat{K}}^{T} (C)$ as a function of C for three choices of thresholding rules. The plot is based on the simulated data set in Section 6.2.

**Figure 2**
Averages (left panel) and standard deviations (right panel) of the relative error p^−1/2||Σ^−1/2Σ̂Σ^−1/2 − I_p||_F with known factors (Σ̂ = Σ̂*_obs* solid red curve), POET (Σ̂ = Σ̂*_K̂* solid blue curve), and sample covariance (Σ̂ = Σ̂_sam dashed curve) over 200 simulations, as a function of the dimensionality p. Top panel: p ranges in 20 to 600 with increment 20; bottom panel: p ranges in 1 to 100 with increment 1.

**Figure 3**
Averages (left panel) and standard deviations (right panel) of ||Σ̂⁻¹ − Σ⁻¹|| with known factors (Σ̂ = Σ̂*_obs* solid red curve), POET (Σ̂ = Σ̂*_K̂* solid blue curve), and sample covariance (Σ̂ = Σ̂_sam dashed curve) over 200 simulations, as a function of the dimensionality p. Top panel: p ranges in 20 to 600 with increment 20; middle panel: p ranges in 1 to 100 with increment 1; Bottom panel: the same as the top panel with dashed curve excluded.

**Figure 4**
Averages (left panel) and standard deviations (right panel) of ||Σ̂ − Σ||_max with known factors (Σ̂ = **Σ̂_o***_bs* solid red curve), POET (Σ̂ = Σ̂*_K̂* solid blue curve), and sample covariance (Σ̂ = Σ̂_sam dashed curve) over 200 simulations, as a function of the dimensionality p. They are nearly indifferentiable.

**Figure 5**
Averages of ||Σ̂ − Σ|| (left panel) and ||Σ^−1/2Σ̂Σ^−1/2 − I_p|| with known factors (Σ̂ = Σ̂*_obs* solid red curve), POET (Σ̂= Σ̂*_K̂ ω* solid blue curve), and sample covariance (Σ̂ = Σ̂_sam dashed curve) over 200 simulations, as a function of the dimensionality p. The three curves are hardly distinguishable on the left panel.

**Figure 6**
Robustness of K as p increases for various choices of K (Design 1, T = 300). Top left: || ${\sum^{^}}_{u, K}^{T} - \sum_{u}$ ||; top right: || ${({\sum^{^}}_{u, K}^{T})}^{- 1} - \sum_{u}^{- 1}$ ||; bottom left: ||Σ̂_K − Σ||_Σ; bottom right: || ${\sum^{^}}_{K}^{- 1} - \sum^{- 1}$ ||.

**Figure 7**
Box plots of regrets R(ŵ) − R^* for p = 80 and 140. In each panel, the box plots from left to right correspond to ŵ obtained using Σ̂ based on approximate factor model, strict factor model, and sample covariance, respectively.

**Figure 8**
Estimation errors for risk assessments as a function of the portfolio size p. Left panel plots the average absolute error |R(ŵ) − R̂ (ŵ)| and right panel depicts the average relative error |R̂ (ŵ)/R(ŵ) − 1|. Here, ŵ and R̂ are obtained based on three estimators of Σ̂.

**Figure 9**
Heatmap of thresholded error correlation matrix for number of factors K = 0, K = 1, K = 2 and K = 3.

**Figure 10**
Risk of portfolios created with POET and SFM (strict factor model)

See this image and copyright information in PMC

Cited by

TGCnA: temporal gene coexpression network analysis using a low-rank plus sparse framework.
Li J, Lai Y, Zhang C, Zhang Q. Li J, et al. J Appl Stat. 2019 Sep 16;47(6):1064-1083. doi: 10.1080/02664763.2019.1667311. eCollection 2020. J Appl Stat. 2019. PMID: 35706920 Free PMC article.
A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY.
Fan J, Wang W, Zhu Z. Fan J, et al. Ann Stat. 2021 Jun;49(3):1239-1266. doi: 10.1214/20-aos1980. Epub 2021 Aug 9. Ann Stat. 2021. PMID: 34556893 Free PMC article.
Inference and uncertainty quantification for noisy matrix completion.
Chen Y, Fan J, Ma C, Yan Y. Chen Y, et al. Proc Natl Acad Sci U S A. 2019 Nov 12;116(46):22931-22937. doi: 10.1073/pnas.1910053116. Epub 2019 Oct 30. Proc Natl Acad Sci U S A. 2019. PMID: 31666329 Free PMC article.
Sparsifying the Fisher Linear Discriminant by Rotation.
Hao N, Dong B, Fan J. Hao N, et al. J R Stat Soc Series B Stat Methodol. 2015 Sep 1;77(4):827-851. doi: 10.1111/rssb.12092. Epub 2014 Nov 7. J R Stat Soc Series B Stat Methodol. 2015. PMID: 26512210 Free PMC article.
Extracting Conditionally Heteroskedastic Components using Independent Component Analysis.
Miettinen J, Matilainen M, Nordhausen K, Taskinen S. Miettinen J, et al. J Time Ser Anal. 2020 Mar;41(2):293-311. doi: 10.1111/jtsa.12505. Epub 2019 Sep 8. J Time Ser Anal. 2020. PMID: 32508370 Free PMC article.

See all "Cited by" articles

References

1. Agarwal A, Negahban S, Martin J, Wainwright MJ. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Ann Statist. 2012;40:1171–1197.
1. Ahn S, Lee Y, Schmidt P. GMM estimation of linear panel data models with time-varying individual effects. J Econometrics. 2001;101:219–255.
1. Alessi L, Barigozzi M, Capassoc M. Improved penalization for determining the number of factors in approximate factor models. Statistics and Probability Letters. 2010;80:1806–1813.
1. Amini AA, Wainwright MJ. High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann Statist. 2009;37:2877–2921.
1. Antoniadis A, Fan J. Regularized wavelet approximations. J Amer Statist Assoc. 2001;96:939–967.

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large Covariance Estimation by Thresholding Principal Orthogonal Complements

Affiliations

Large Covariance Estimation by Thresholding Principal Orthogonal Complements

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources