Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;113(10):7451-7477.
doi: 10.1007/s10994-024-06599-8. Epub 2024 Aug 7.

Empirical Bayes Linked Matrix Decomposition

Affiliations

Empirical Bayes Linked Matrix Decomposition

Eric F Lock. Mach Learn. 2024 Oct.

Abstract

Data for several applications in diverse fields can be represented as multiple matrices that are linked across rows or columns. This is particularly common in molecular biomedical research, in which multiple molecular "omics" technologies may capture different feature sets (e.g., corresponding to rows in a matrix) and/or different sample populations (corresponding to columns). This has motivated a large body of work on integrative matrix factorization approaches that identify and decompose low-dimensional signal that is shared across multiple matrices or specific to a given matrix. We propose an empirical variational Bayesian approach to this problem that has several advantages over existing techniques, including the flexibility to accommodate shared signal over any number of row or column sets (i.e., bidimensional integration), an intuitive model-based objective function that yields appropriate shrinkage for the inferred signals, and a relatively efficient estimation algorithm with no tuning parameters. A general result establishes conditions for the uniqueness of the underlying decomposition for a broad family of methods that includes the proposed approach. For scenarios with missing data, we describe an associated iterative imputation approach that is novel for the single-matrix context and a powerful approach for "blockwise" imputation (in which an entire row or column is missing) in various linked matrix contexts. Extensive simulations show that the method performs very well under different scenarios with respect to recovering underlying low-rank signal, accurately decomposing shared and specific signals, and accurately imputing missing data. The approach is applied to gene expression and miRNA data from breast cancer tissue and normal breast tissue, for which it gives an informative decomposition of variation and outperforms alternative strategies for missing data imputation.

Keywords: Data integration; dimension reduction; low-rank factorization; missing data imputation; variational Bayes.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest/Competing interests None declared.

Figures

Fig. 1
Fig. 1
Error in estimating the underlying low-rank signal for a single matrix under different methods, and under different signal-to-noise (s2n) ratios. The left-panel gives relative squared error, and the right-panel gives oracle normalized standard error. All axes are on a log-scale.
Fig. 2
Fig. 2
Error in estimating underlying low-rank structure in which the rank-1 components have heterogenous signal sizes. The left-panel gives relative squared error (RSE), and the right-panel gives oracle normalized standard error (ONSE). All axes are on a log-scale.
Fig. 3
Fig. 3
Missing data imputation accuracy for different levels of missingness. The left column gives RSEmiss and the right gives ONSEmiss.
Fig. 4
Fig. 4
Error in estimating the underlying low-rank signal for two linked matrices under different signal-to-noise (s2n) ratios. The left-panel gives the RSE, and the right-panel gives RDSE. All axes are on a log-scale.
Fig. 5
Fig. 5
RSE (left) and RDSE (right) for low-rank structure with heterogenous signal levels for two linked matrices.
Fig. 6
Fig. 6
RSE (left) and RDSE (right) for the scenario with 2 × 2 bidimensionally linked matrices.
Fig. 7
Fig. 7
RSE for entrywise missing data imputation (left) and blockwise missing data imputation (right), for the scenario with 2 × 2 bidimensionally linked matrices.
Fig. 8
Fig. 8
Heatmaps of the BRCA data (left) and the full low-rank structure estimated by EV-BIDIFAC. Higher values are colored red, lower values are colored blue, and missing columns are colored black.

References

    1. Attias H: Inferring parameters and structure of latent variable models by variational Bayes. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 21–30. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA: (1999)
    1. Cai T, Cai TT, Zhang A: Structured matrix completion with applications to genomic data integration. Journal of the American Statistical Association 111(514), 621–633(2016) - PMC - PubMed
    1. Feng Q, Jiang M, Hannig J, Marron J: Angle-based joint and individual variation explained. Journal of multivariate analysis 166, 241–265 (2018)
    1. Fox CW, Roberts SJ: A tutorial on variational Bayesian inference. Artificial intelligence review 38, 85–95(2012)
    1. Gavish M, Donoho DL: Optimal shrinkage of singular values. IEEE Transactions on Information Theory 63(4), 2137–2152 (2017)

LinkOut - more resources