Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 1;75(4):10.1111/rssb.12016.
doi: 10.1111/rssb.12016.

Large Covariance Estimation by Thresholding Principal Orthogonal Complements

Affiliations

Large Covariance Estimation by Thresholding Principal Orthogonal Complements

Jianqing Fan et al. J R Stat Soc Series B Stat Methodol. .

Abstract

This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.

Keywords: High-dimensionality; approximate factor model; cross-sectional correlation; diverging eigenvalues; low-rank matrix; principal components; sparse matrix; thresholding; unknown factors.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Minimum eigenvalue of ^u,K^T(C) as a function of C for three choices of thresholding rules. The plot is based on the simulated data set in Section 6.2.
Figure 2
Figure 2
Averages (left panel) and standard deviations (right panel) of the relative error p−1/2||Σ−1/2Σ̂Σ−1/2Ip||F with known factors (Σ̂ = Σ̂obs solid red curve), POET (Σ̂ = Σ̂ solid blue curve), and sample covariance (Σ̂ = Σ̂sam dashed curve) over 200 simulations, as a function of the dimensionality p. Top panel: p ranges in 20 to 600 with increment 20; bottom panel: p ranges in 1 to 100 with increment 1.
Figure 3
Figure 3
Averages (left panel) and standard deviations (right panel) of ||Σ̂−1Σ−1|| with known factors (Σ̂ = Σ̂obs solid red curve), POET (Σ̂ = Σ̂ solid blue curve), and sample covariance (Σ̂ = Σ̂sam dashed curve) over 200 simulations, as a function of the dimensionality p. Top panel: p ranges in 20 to 600 with increment 20; middle panel: p ranges in 1 to 100 with increment 1; Bottom panel: the same as the top panel with dashed curve excluded.
Figure 4
Figure 4
Averages (left panel) and standard deviations (right panel) of ||Σ̂Σ||max with known factors (Σ̂ = Σ̂obs solid red curve), POET (Σ̂ = Σ̂ solid blue curve), and sample covariance (Σ̂ = Σ̂sam dashed curve) over 200 simulations, as a function of the dimensionality p. They are nearly indifferentiable.
Figure 5
Figure 5
Averages of ||Σ̂Σ|| (left panel) and ||Σ−1/2Σ̂Σ−1/2Ip|| with known factors (Σ̂ = Σ̂obs solid red curve), POET (Σ̂= Σ̂ ω solid blue curve), and sample covariance (Σ̂ = Σ̂sam dashed curve) over 200 simulations, as a function of the dimensionality p. The three curves are hardly distinguishable on the left panel.
Figure 6
Figure 6
Robustness of K as p increases for various choices of K (Design 1, T = 300). Top left: || ^u,KT-u||; top right: || (^u,KT)-1-u-1||; bottom left: ||Σ̂KΣ||Σ; bottom right: || ^K-1--1||.
Figure 7
Figure 7
Box plots of regrets R(ŵ) − R* for p = 80 and 140. In each panel, the box plots from left to right correspond to ŵ obtained using Σ̂ based on approximate factor model, strict factor model, and sample covariance, respectively.
Figure 8
Figure 8
Estimation errors for risk assessments as a function of the portfolio size p. Left panel plots the average absolute error |R(ŵ) − (ŵ)| and right panel depicts the average relative error | (ŵ)/R(ŵ) − 1|. Here, ŵ and are obtained based on three estimators of Σ̂.
Figure 9
Figure 9
Heatmap of thresholded error correlation matrix for number of factors K = 0, K = 1, K = 2 and K = 3.
Figure 10
Figure 10
Risk of portfolios created with POET and SFM (strict factor model)

Similar articles

Cited by

References

    1. Agarwal A, Negahban S, Martin J, Wainwright MJ. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Ann Statist. 2012;40:1171–1197.
    1. Ahn S, Lee Y, Schmidt P. GMM estimation of linear panel data models with time-varying individual effects. J Econometrics. 2001;101:219–255.
    1. Alessi L, Barigozzi M, Capassoc M. Improved penalization for determining the number of factors in approximate factor models. Statistics and Probability Letters. 2010;80:1806–1813.
    1. Amini AA, Wainwright MJ. High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann Statist. 2009;37:2877–2921.
    1. Antoniadis A, Fan J. Regularized wavelet approximations. J Amer Statist Assoc. 2001;96:939–967.

LinkOut - more resources