Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016;111(514):621-633.
doi: 10.1080/01621459.2015.1021005. Epub 2016 Aug 18.

Structured Matrix Completion with Applications to Genomic Data Integration

Affiliations

Structured Matrix Completion with Applications to Genomic Data Integration

Tianxi Cai et al. J Am Stat Assoc. 2016.

Abstract

Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

Keywords: Constrained minimization; genomic data integration; low-rank matrix; matrix completion; singular value decomposition; structured matrix completion.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustrative example with A ∈ ℝ30×30, m1 = m2 = 10. (A darker block corresponds to larger magnitude.)
Figure 2
Figure 2
Searching for the appropriate position to truncate from = 10 to 1.
Figure 3
Figure 3
Spectral norm loss (left panel) and Frobenius norm loss (right panel) when there is a gap between σr(A) and σr+1(A). The singular value values of A are given by (21), p1 = p2 = 1000, and m1 = m2 = 50.
Figure 4
Figure 4
Spectral norm loss (left panel) and Frobenius norm loss (right panel) as the thresholding constant c varies. The singular values of A are {jα, j = 1, 2, …} with α varying from 0.3 to 2, p1 = p2 = 1000, and m1 = m2 = 50.
Figure 5
Figure 5
Spectral and Frobenius norm losses with column/row thresholding. The singular values of A are {j−1, j = 1, 2, …}, p1 = 300, p2 = 3000, and m1, m2 = 10, …, 150.
Figure 6
Figure 6
Comparison of the proposed SMC method with the NNM method with 5-cross-validation for the settings with singular values of A being {jα, j = 1, 2, …} for α ranging from 0.6 to 2, p1 = p2 = 500, and m1 = m2 = 50 or 100.
Figure 7
Figure 7
Imputation scheme for integrating multiple OC genomic studies.

References

    1. Argyriou A, Evgeniou T, Pontil M. Convex multi-task feature learning. Machine Learning. 2008;73(3):243–272.
    1. Berchuck A, Iversen ES, Lancaster JM, Pittman J, Luo J, Lee P, Murphy S, Dressman HK, Febbo PG, West M, et al. Patterns of gene expression that characterize long-term survival in advanced stage serous ovarian cancers. Clinical Cancer Research. 2005;11(10):3686–3696. - PubMed
    1. Biswas P, Lian TC, Wang TC, Ye Y. Semidefinite programming based algorithms for sensor network localization. ACM Transactions on Sensor Networks (TOSN) 2006;2(2):188–220.
    1. Bonome T, Lee JY, Park DC, Radonovich M, Pise-Masison C, Brady J, Gardner GJ, Hao K, Wong WH, Barrett JC, et al. Expression profiling of serous low malignant potential, low-grade, and high-grade tumors of the ovary. Cancer Research. 2005;65(22):10602–10612. - PubMed
    1. Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. The American Journal of Human Genetics. 2009;84(2):210–223. - PMC - PubMed