Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep:150:55-74.
doi: 10.1016/j.jmva.2016.05.002. Epub 2016 May 19.

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data

Affiliations

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data

T Tony Cai et al. J Multivar Anal. 2016 Sep.

Abstract

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.

Keywords: Adaptive thresholding; bandable covariance matrix; generalized sample covariance matrix; missing data; optimal rate of convergence; sparse covariance matrix; thresholding.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Weight matrix for the blockwise tridiagonal estimator.
Figure 2
Figure 2
Illustration of the ovarian cancer dataset. Black block = completely observed; White block = completely missing.
Figure 3
Figure 3
Heatmaps of the covariance matrix estimate with all the observed data.
Figure 4
Figure 4
Heatmaps of the covariance matrix estimate with additional missing values.

References

    1. Andreopoulos B, Anastassiou D. Integrated analysis reveals hsa-mir-142 as a representative of a lymphocyte-specific gene expression and methylation signature. Cancer Informatics. 2012;11:61–75. - PMC - PubMed
    1. Bickel PJ, Levina E. Regularized estimation of large covariance matrices. Ann. Statist. 2008;36:199–227.
    1. Bickel PJ, Levina E. Covariance regularization by thresholding. Ann. Statist. 2008;36:2577–2604.
    1. Bonome T, Lee J-Y, Park D-C, Radonovich M, Pise-Masison C, Brady J, Gardner GJ, Hao K, Wong WH, Barrett JC, et al. Expression profiling of serous low malignant potential, low-grade, and high-grade tumors of the ovary. Cancer Research. 2005;65:10602–10612. - PubMed
    1. Cai TT, Liu W. Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 2011;106:672–684.

LinkOut - more resources