Linked matrix factorization

Michael J O'Connell¹, Eric F Lock²

Affiliations

¹ Department of Statistics, Miami University, Oxford, Ohio 45056.
² Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455.

PMID: 30516272
DOI: 10.1111/biom.13010

Linked matrix factorization

Michael J O'Connell et al. Biometrics. 2019 Jun.

. 2019 Jun;75(2):582-592.

doi: 10.1111/biom.13010. Epub 2019 Apr 2.

Authors

Michael J O'Connell¹, Eric F Lock²

Affiliations

¹ Department of Statistics, Miami University, Oxford, Ohio 45056.
² Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455.

PMID: 30516272
DOI: 10.1111/biom.13010

Abstract

Several recent methods address the dimension reduction and decomposition of linked high-content data matrices. Typically, these methods consider one dimension, rows or columns, that is shared among the matrices. This shared dimension may represent common features measured for different sample sets (horizontal integration) or a common sample set with features from different platforms (vertical integration). We introduce an approach for simultaneous horizontal and vertical integration, Linked Matrix Factorization (LMF), for the general case where some matrices share rows (e.g., features) and some share columns (e.g., samples). Our motivating application is a cytotoxicity study with accompanying genomic and molecular chemical attribute data. The toxicity matrix (cell lines $\times$ chemicals) shares samples with a genotype matrix (cell lines $\times$ SNPs) and shares features with a molecular attribute matrix (chemicals $\times$ attributes). LMF gives a unified low-rank factorization of these three matrices, which allows for the decomposition of systematic variation that is shared and systematic variation that is specific to each matrix. This allows for efficient dimension reduction, exploratory visualization, and the imputation of missing data even when entire rows or columns are missing. We present theoretical results concerning the uniqueness, identifiability, and minimal parametrization of LMF, and evaluate it with extensive simulation studies.

Keywords: data integration; dimension reduction; massive data sets; missing data imputation; principal components analysis.

PubMed Disclaimer

References

REFERENCES

1. 1000 Genomes Project Consortium et al. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56.
1. Abdo, N., Xia, M., Brown, C. C., Kosyk, O., Huang, R., Sakamuru, S., Yi- Hui, Z., Jack, J. R., Gallins, P., Xia, K., et al. (2015). Population-based in vitro hazard and concentration-response assessment of chemicals: The 1000 genomes high-throughput screening study. Environ Health Perspect (Online) 123, 458.
1. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B 57, 289-300.
1. Crainiceanu, C. M., Caffo, B. S., Luo, S., Zipunnikov, V. M., and Punjabi, N. M. (2011). Population value decomposition, a framework for the analysis of image populations. J Am Stat Assoc 106, 775-790.
1. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39, 1-38.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Linked matrix factorization

Affiliations

Linked matrix factorization

Authors

Affiliations

Abstract

References

REFERENCES

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources